(positive) design of ribonucleic...

(Positive) Design of RiboNucleic Acids

Yann Ponty?,•,†

? Centre National de la Recherche Scientifique (CNRS)

• LIX, Ecole Polytechnique

† AMIBio team, Inria Saclay

http://goo.gl/mejsFh

PhD in Computer Science, Université Paris-Sud (France)

CNRS research scientist

Faculty at LIX, Computer Science department of Ecole Polytechnique

Group leader – AMIBio team (Ecole Polytechnique and Inria Saclay)

Postdoc experience in RNA Computational Biology (Boston, Paris)and Discrete Mathematics (Paris)

Extended sabbatical at Simon Fraser University (Vancouver, Canada)

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸︷︷︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .



Pol


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .


T A C C A A T G G G T APol

A


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .



A U


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .



A U G


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .



A U G G U U A C C C A U


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .





RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .




Ribosome


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .




Ribosome

Met


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .




Ribosome

ValMet


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .




Ribosome

ThrVal

Met


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .




Ribosome

HisThr

ValMet


RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?


20+ Amino acids

THE CODE(genes)


MEH. . .




Met Val Thr His Ile Leu His Asn


DNA

RNA

Proteins

Transcription

Translation

Regulation

Transfer

Participates

Maturation

Car

rier

Synthesis

RNA functionsMessenger

Translation

Regulation

Enzyme

Catalytic

. . .

0

500

1000

1500

2000

2500

3000

#RNA Functional Families (RFAM DB)

Fundamental dogma of molecular biology (v2.0)

DNA

RNA

Proteins

Transcription

TranslationRegulation

Transfer

Participates

Maturation

Car

rier

Synthesis


Translation

Regulation

Enzyme

Catalytic

. . .

0

500

1000

1500

2000

2500

3000



DNA

RNA

Proteins

Transcription

TranslationRegulation

Transfer

Participates

Maturation

Car

rier

Synthesis


Translation

Regulation

Enzyme

Catalytic

. . .

0

500

1000

1500

2000

2500

3000


RNA world: Resolving the chicken vs egg paradox at the origin of life. . .

Gene Enzyme

encodes

replicates

RNA

encodes

replicates

A gene big enough to specify an enzyme would be too big to replicate accuratelywithout the aid of an enzyme of the very kind that it is trying to specify. So thesystem apparently cannot get started.[· · · ] This is the RNA World. To see how plausible it is, we need to look at whyproteins are good at being enzymes but bad at being replicators; at why DNA isgood at replicating but bad at being an enzyme; and finally why RNA might justbe good enough at both roles to break out of the Catch-22.R. Dawkins. The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution

RNA folding

G/C

Canonical base-pairs

Watson/C

rick base-pairsW

obble base-pair

U/A

U/G

RNA are single-stranded andfold on themselves, establishingcomplex 3D structures that areessential to their function(s).

RNA structures are stabilized bybase-pairs, each mediated byhydrogen bonds.

RNA Design

RNA = Linear Polymer = Sequence in {A,C,G,U}?

UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)

Evolution of RNAs

Homologous genes = Functionally equivalent, within or across organismsUsually well-captured by sequence similarity in proteins, binding sites. . .

Problem: Many classes of non-(protein) coding RNAs (ncRNAs) poorlyconserved at the sequence level but adopt a conserved structure!

RFAM Bacterial RNAse-P class B Alignment

RNA Design


UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122


Structure Prediction

RNA Design


UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122


Structure Prediction

RNA Design

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Controlled experiments through RNA design

Motivation: Quantifying the impactof structure S on efficacy of a singleExon Splicing Enhancers (ESE):

Presence of ESE motif E ;

Different structures S1, S2. . . ;

Avoid library of (∼1500!)documented ESEs motifs.

Objectives. Design RNA which:1 Folds into a prescribed

structure;2 Features/avoids motifs.3 Control GC%, Boltz. prob.. . . .

Structural context of ESE motif intranscript was shown to affect itsfunctionality. [Liu et al, FEBS Lett. 2010]

Design objectives

Positive structural designOptimize affinity of designs towards target structure(s)Examples: Most stable sequence for given fold. . .

Negative structural designLimit affinity of designs towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .

Additional constraints:

Forbid motif list to appear anywhere in designForce motif list to appear each at least onceLimit available alternatives at certain positionsControl overall composition (GC-content)

Outline

I. Single Structure Design (IncaRNAtion)

II. Constrained Design using Formal Languages

III. Multiple Structures

I. Inverse Folding

Designing a given structure

RNA sequence and structure(s)


UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Tertiary Structure5s rRNA (PDBID: 1K73:B)

MFE folding prediction: O(n3)

Crossing interactions

Excluded from the secondary structure:

Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].

Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)

(Pseudo?)knots: Crossing sets of nested stable base-pairs

G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G

1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229

Group I Ribozyme (PDBID: 1Y0Q:A)

Crossing interactions

Excluded from the secondary structure:

Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].

Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)

(Pseudo?)knots: Crossing sets of nested stable base-pairs

G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G

1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229

Group I Ribozyme (PDBID: 1Y0Q:A)

Crossing interactions do exist!

Example: Group II Intron (PDB ID: 3IGI)

But are hard to predict[Lyngsoe-ICALP’04]

[Sheikh Backofen Ponty, CPM’12]

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T →∞

mRNA half-life: ∼7h(Mouse [Sharova2009])






T = 0h







T = 1h







T = 2h







T = 5h







T = 10h







T →∞







T = 10h


Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

Problem statement


1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26




ES = 2 ·∆

U

G

+ 4 ·∆

G

C

+ 2 ·∆

C

G

Problem statement


1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26




ES = ∆

(C

G

G

C

)+ ∆

(G

C

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)

Problem statement


1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26




ES = ∆

(C

G

G

C

)+ ∆

(G

C

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

U

ACA

G

)+ ∆

(G

A

UG

A

G

CC

)+ ∆

(C A

U

GUG

)

Problem statement


1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26




Definition (MFE-PREDICT(E) problem)

Input: RNA sequence w ∈ {A,C,G,U}∗Output: Secondary struct. S∗ with Minimal Free-Energy (MFE) Ew (S∗)

Problem solved exactly in O(n3) time.[Nussinov Jacobson, PNAS 1980] [Zuker Stiegler, NAR 1981]. . . .

Dynamic programming (DP) for RNA folding

Theorem ([Nussinov and Jacobson(1980)])

Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory

i j=

i i+1 j+

i k j

≥ θ

Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))

Ni,j : Max #base-pairs over interval [i, j]

Ni,t = 0, ∀t ∈ [i, i + θ]

Ni,j = min

Ni+1,j {i unpaired}j

mink=i+θ+1

Ei,k + Ni+1,k−1 + Nk+1,j {i paired to k }

Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]

Comparative folding [Sankoff1985]

Equilibrium base-pairing probabilities [McCaskill1990]

Moments of additive features [Miklos2005,Ponty2011]

∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]

Basic crossing structures [Rivas1999]. . .

Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]


Maximum expected accuracy structure [Do2006]

Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]

Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space

Objective function additive with respect to DP scheme




i j=

i i+1 j+

i k j

≥ θ


Ci,j : Number of secondary structures compatible with interval [i, j]

Ci,t = 1, ∀t ∈ [i, i + θ]

Ci,j =∑∑∑{

Ci+1,j {i unpaired}∑∑∑jk=i+θ+11comp.(i,k)×Ci+1,k−1×Ck+1,j {i paired to k }
















i j=

i i+1 j+

i k j

≥ θ


Zi,j =∑

S comp.with w[i,j]

e−Ew (S)

RT = Partition function for compatible structs within [i, j]

Zi,t = 1, ∀t ∈ [i, i + θ]

Zi,j =∑∑∑{

Zi+1,j {i unpaired}∑jk=i+θ+1e

−Ei,kRT ×Zi+1,k−1×Zk+1,j {i paired to k }













RNA inverse folding


UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122


MFE folding prediction: Θ(n3)

Inverse folding: NP-hard?

RNA Inverse Folding

Definition (INVERSE-FOLDING(E) problem)

Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:

∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆

or ∅ if no such sequence exists.

Difficult problem: No obvious DP decomposition

Existing algorithms: Heuristics or Exponential-time

Complexity of problem unknown (despite [Schnall Levin et al, ICML’08])Reason: Non locality, no theoretical frameworks, too many parameters. . .

RNA Inverse Folding



∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆


Example:

1

10

20

26

RNA Inverse Folding



∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆


Example:

1

10

20

26

= 1 10

13

1 10

13

RNA Inverse Folding



∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆


Example:

1

10

20

26

= 1 10

13

1 10

13

AAGAGUCGCUCUCfolds<-----1 10

13

RNA Inverse Folding



∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆


Example:AAGAGUCGCUCUCAAGAGUCGCUCUC

1

10

20

26

1

10

20

26

Folds

Existing approaches for negative design

Based on local search. . .

RNAInverse - TBI Vienna

Info-RNA - Backofen@Freiburg

RNA-SSD - Condon@UBC

NUPack - Pierce@Caltech

. . . bio-inspired algorithms. . .

RNAFBinv - Barash@Ben Gurion

FRNAKenstein - Hein@Oxford

AntaRNA - Backofen@Freiburg

ERD - Ganjtabesh@Tehran

. . . exact approaches. . .

RNAIFold - Clote@Boston College

CO4 - Will@Leipzig

Typical issues:Naive initialization strategiesPoor coverage of sequence space:

Local search remain confined near initial sequenceGC-rich produced sequences

⇒ Global sampling [Levin et al, NAR 12]

The case for a control of GC-content

High GC-content suspected to induce kinetic traps

Global sampling [Levin et al, NAR 12]

Target structure SBoltzmann distribution based on affinity towards SRandom generation from Boltzmann DistributionFold sampled sequences and compare to target

Boltzmann factor:

Bw (S) := e−Ew (S)

RT

Pseudo-Partition Function:

Z(S) =∑

w∈Σ∗

Bw (S)

Boltzmann probability:

p(s) :=Bw (S)

Z

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

IncaRNAtion [Reinharz et al, Bioinformatics 2013]

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

Explore sequence spaceStructure fixed

(Only 1 case applies)

a′a bi + 1 j

∑a′

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

a′ b′a bi + 1 j − 1

∑a′,b′

"Global" Stochastic Backtrack

Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence:

nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

− 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −


Z = weight of nt1

Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1

nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −


Z = weight of nt1

Z = weight of nt2 ∧ nt17

Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17

nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt2 nt17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −


Z = weight of nt1Z = weight of nt2 ∧ nt17


Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17nt3 nt16

nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt3 nt16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −


Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16


Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8

nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt4 nt8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −


Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8



Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15

nt10 nt14

−nt9 nt15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −


Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15


Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt10 nt14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

GC-content bias

0.0 0.2 0.4 0.6 0.8 1.0GC content

0

200

400

600

800

1000

Quanti

ty

Target 15% GC

Average GC content is 64%

RFAM average GC content48.8%

Weighted DP Recursions

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

a′a bi + 1 j

∑a′

x#GC(a′)

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

x#GC(a′b′)

a′ b′a bi + 1 j − 1

∑a′,b′

x#GC(a′b′)

Incarnation NT distribution: Bissection scheme

..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

[Waldispühl and Ponty, RECOMB, 2011]


..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141



..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

.

x#GC = 0.0625

.

Tot round 3 is 547



..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

.

x#GC = 0.0625

.

Tot round 3 is 547

.

x#GC = 0.0938

.

Tot round 4 is 823



..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

.

x#GC = 0.0625

.

Tot round 3 is 547

.

x#GC = 0.0938

.

Tot round 4 is 823

.

x#GC = 0.0781

.

Tot round 5 is 1165


Limits of the approach

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .

Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]

Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]

Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]

Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]


1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30







1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30







1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30






Local vs Global vs "Glocal"

The success of glocal strategies

Sampling + Optimize creates highly probable design sequences

II. Constrained design

Avoiding/forcing motifs

Existing approaches for negative design

Based on local search. . .

RNAInverse - TBI Vienna

Info-RNA -Backofen@Freiburg

RNA-SSD - Condon@UBC

NUPack - Pierce@Caltech

. . . bio-inspired algorithms. . .

RNAFBinv - Barash@Ben Gurion

FRNAKenstein - Hein@Oxford

AntaRNA - Backofen@Freiburg

. . . exact approaches. . .

RNAIFold - Clote@Boston College

CO4 - Will@Leipzig

Few algorithms support avoided/mandatory motifs. . .

. . . none guarantees reasonable runtime.

Typical reasons:Deep local minima (Rugged landscape)

Mandatory motifs⇒ Late deadends (Branch and Bound)

Forbidden motifs⇒ Search space disconnection (Local Search)

Problem with local approaches: An example

Simplified vocabulary {A,U}

+ Forbidden motifs F = {AU,UA}

UUUUUAAAAA

AAUAA

AUAAA

UAAAA UUUUA

UUUAU

UUAUU

UAUUU

UAAUU

AUUAU

AAUAU

AAUUU

AAAUA

AAAUA

AAUUA

AAAUU

AUUUU

⇒ F may disconnect search space (holds for any move set!)

Problem with local approaches: An example

Simplified vocabulary {A,U} + Forbidden motifs F = {AU,UA}

UUUUUAAAAA

AAUAA

AUAAA

UAAAA UUUUA

UUUAU

UUAUU

UAUUU

UAAUU

AUUAU

AAUAU

AAUUU

AAAUA

AAAUA

AAUUA

AAAUU

AUUUU

⇒ F may disconnect search space (holds for any move set!)

Idea

Use formal language constructs to constrain global sampling

Forced motifsAvoided motifs

→ Regular language LC ∈ Reg

Structure compatibility+ Positional constraints+ Energy Model

→ Weighted Context-Free Lang LS ∈ CFL

Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL

Build weighted context-free grammar G for LC ∩ LS+ Random generation

⇒ Global sampling under constraints

Building the Finite State Automaton

To force multiple words, keeptrack of generated words:

Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata

#States:

O(

2|M| ·(∑

i |fi |+∑

j |mj |))

Example: M = {AGC,GG}

ε

A

AG

AGC

C

G

A

G

GG

G

G

ε

A

AG

AGC

C

G

A

ε

G

G

G

G

ε

A,C,G,U

AM

A{GG} A{AGC}A∅




#States:

O(

2|M| ·(∑

i |fi |+∑

j |mj |))

Example: M = {AGC,GG}

ε

A

AG

G

A

G

G

ε

A

AG

G

A

ε

G

G

ε

A,C,G,U

C

G

CG

AM

A{GG} A{AGC}

A∅




#States:

O(

2|M| ·(∑

i |fi |+∑

j |mj |))

Example:M = {AGC,GG};F = {AA}

ε

A

AG

G

A

G

G

ε

A

AG

G

A

ε

G

G

A

A

ε

A

A

C,G,U

C

G

CG

⊥A

A

A

A

Building the grammar

Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position

. ( ( ( . ) ) ( . . ) )

1 5 10 12



. ( ( ( . ) ) ( . . ) )

1 5 10 12

S1

S2

S3S4

S5

S8

S9S10



S1 → .S2 S2 → (S3 ) S3 → (S4 )S8 S4 → (S5 )

S5 → . S8 → (S9 ) S9 → .S10 S10 → .



V1 → AV2 | CV2 | GV2 | UV2

V2 → AV3U | CV3G | GV3C | GV3U | UV3A | UV3G

V3 → AV4UV8 | CV4GV8 | GV4CV8 | GV4UV8| UV4AV8 | UV4GV8

V4 → AV5U | CV5G | GV5C | GV5U | UV5A | UV5G

V5 → A | C | G | UV8 → AV9U | CV9G | GV9C | GV9U | UV9A | UV9G

V9 → AV10 | CV10 | GV10 | UV10

V10 → A | C | G | U



V1 → AV2 | CV2 | GV2 | UV2

V2 → AV3U |��CV3G |��GV3C |��GV3U | UV3A |��UV3G

V3 → AV4UV8 |��CV4GV8 |��GV4CV8 |��GV4UV8| UV4AV8 |��UV4GV8

V4 → AV5U |��CV5G |��GV5C |��GV5U | UV5A |��UV5G

V5 → A | C | G | UV8 → AV9U |��CV9G |��GV9C |��GV9U | UV9A |��UV9G

V9 → AV10 | CV10 | GV10 | UV10

V10 → A | C | G | U

Random generation

Combine CFG and aut. → CFG (Multiplying #Rules by |Q|3)

GenRGenS [Ponty Termier Denise, Bioinformatics 2006]:Precomputes #words for each non-terminalRandom Generation w.r.t. weighted distribution

Energy models:Uniform distributionNussinov energy modelStacking-pairs model (Turner 2004)

Based on refined, yet similar, grammar

Overall complexity: |S| · 23|M| ·(∑

i |fi |+∑

j |mj |)3

Linear on |S|Exponential on |M|, but NP-Hard problem

III. Positive design for multiplestructures

Motivation: Kinetics and riboswitches

∆G

- Local Minimum (LM)

Design objectives

Positive structural designOptimize affinity of designed sequences towards target structureOr simply ensure their compatibility with one or several structuresExamples: Most stable sequence for given fold. . .

Negative structural designLimit affinity of designed sequences towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .

Additional constraints:

Forbid motif list to appear anywhere in design

Force motif list to appear each at least onceLimit available alternatives at certain positions

Control overall composition (GC-content)

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456


C

U

G

A


G A G C U C A G C G C G A A C G C U C U C A

Compatible Sequence



C

U

G

A


A A A A G A C U G G A C U U G G C C U U U C

Incompatible Sequence

Question: How many Compatible sequences?

Answer: 4#BPs × 4#Unpaired → 268 435 456


C

U

G

A




Answer: 4#BPs × 4#Unpaired → 268 435 456


C

U

G

A




Counting compatible RNAs: Watson-Crick + Two structures

C

U

G

A



aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r


Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)

4#CCs → 65 536

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A



Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r


Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Counting compatible RNAs: WC/Wobble + Single struct.

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs



Answer: 4#Unpaired × 6#BPs → 6 879 707 136

Counting compatible RNAs: WC/Wobble + Two structures

C

U

G

A



aeg

p

u

k


bd

h j q

t

f l o v c s

i m n r


Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc)

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart

AU

GC

A

U

C

G

G

CG

U

U

A

Remark: A↔ C/G↔ U symmetry




p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1



εstart • ◦A,C

G,U

◦

•◦


p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)




p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1



εstart • ◦A,C

G,U

◦

•◦


m•(n) = m◦(n − 1)

m◦(n) = m◦(n − 1) + m•(n − 1)

= m◦(n − 1) + m◦(n − 2)

= F(n + 2)





p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1



εstart • ◦A,C

G,U

◦

•◦


m•(n) = m◦(n − 1)

m◦(n) = m◦(n − 1) + m•(n − 1)

= m◦(n − 1) + m◦(n − 2)

= F(n + 2)

(Since m◦(0) = 1 and m◦(1) = 2)





p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1


For cycle: A barely more involved DFA generates compatible sequences

εstart

• ◦1 •1

◦2 •2•2

A,C◦

◦

•

◦G,U

◦

•

◦


m◦2 (n) = F(n + 2)

m◦1 (n) = F(n + 1)

(Since m◦1 (0) = 1 and m◦1 (1) = 1)

c(n) := mε(n) = 2 m◦1 (n − 2) + 2 m◦2 (n − 1)

= 2(F(n − 1) + F(n + 1)) = 2F(n) + 4F(n − 1)




p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1


Theorem (#Compatible designs for general 2-structures graphs)

G: dependency graph associated with 2 RNA structures (max deg=2).The number #Designs(G) of compatible designs for G is given by

#Designs(G) =∏

p∈P(G)

2F|p|+2 ×∏

c∈C(G)

(2F|c| + 4F|c|−1

)where G decomposes into paths P(G) and cycles C(G).

Counting compatible sequences: WC/Wobble + Two structures

C

U

G

A



aeg

p

u

k


bd

h j q

t

f l o v c s

i m n r


Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc) = 2 322 432

Counting compatible sequences: Watson-Crick + > 2 structures

C

U

G

A




aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r


Answer: Non-bipartite→ ∅; Bipartite→

∏cc∈CC(G)

2×#IS(cc)


C

U

G

A




aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r


Answer: Non-bipartite→ ∅; Bipartite→∏

cc∈CC(G)

2×#IS(cc)

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).


C

U

G

AA

CG

U

fed

i

g

h

cba

A






C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U






C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

G






C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

G C

G






C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

G C

G

A U






C

U

G

AA

CG

U

fed

i

g

h

cba






C

U

G

AA

CG

U

fed

i

g

h

cba

C

G

G

G

U A

U

C G





Valid designs and independent sets

Theorem (#Valid design for bipartite connected dependency graphs)

Let G be a bipartite connected dependency graph, one has:

#Designs(G) = 2×#Designs?(G) = 2×#IS(G)

For a bipartite dependency graph G we get:

#Designs(G) =∏

cc∈CC(G)

2×#IS(cc) = 2|CC(G)| ×#IS(G)

But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]

(+ Any G is a dependency graph)

Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .

Theorem

#Designs is #P-hard.

No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)

Consequences

Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]

For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)

Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]

Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.

Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).

Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].

Conjecture (#BIS hardness of sampling)

Generating comp. sequences (almost) uniformly for general input is #BIS-hard.

Perspectives: FPT and Boltzmann sampling algorithms

b

c

de

f gh i

j k l m

nopq

rs

a t

uv

a b c d e

f

gh

ijk l

mn o

pqr

s t u v

a

b

c

d

e f

g

h ij

k

l m n o

p

q

r st

u

v

R1

R2

R3

i) Input Structures

ab c d e f g h i j k l mn o p q r s t u v

ii) Merged Base-Pairs

aeg

p uk

csn

j

qtb

d h m

f l o

vi

r

iii) Compatibility Graph

∅

v o

o ll ff r

l i

j qq t

t bb dd hh m

k pk p g

g n g p uu g e

u e a e ss c

iv) Tree Decomposition

RNARedPrint

Partition FunctionStochastic Backtrack

GACUUUAGCU

UGUUUGAAGU

UA

GACUUGGGAC

UUUUAGGUGU

CU

UUAGAUUCCA

GGGGCUUGU

AGG

GGUUUAGGGC

CUUUAGGUAU

CU

AUUGUGACGG

UCGUGACUAG

UU

. . .

0

0

0

−5

−5

−5

kcal.mol−1

kcal.mol−1

kcal.mol−1

E1

E2

E3GC%

0 50 100

v) Weight Optimization (Adaptive Sampling)

Weightsupd

ate

GCCGCGGUAGCUACAGCCGGCUUUGGGGUUGGGUAGACUCCGGUGCUGCAGCGGCUGUGGCUGGCCGGUUCUGGUUUGCUUAGGGCUACGACGGCGGUGCCGGCAUUUGC. . .

0kcal.mol−1

E1E2E3

GC%

0 50 100

vi) Final Designs

FPT algorithm for counting based on tree decompositionMultidimensional Boltzmann sampling to control energies, GC. . .


C

U

G

A




aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r


Answer: Bipartite→

∏cc∈CC(G)

2×#IS(cc) = 496 672


C

U

G

A




aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r


Answer: Bipartite→∏

cc∈CC(G)

2×#IS(cc) = 496 672

Perspectives: FPT and Boltzmann sampling algorithms

b

c

de

f gh i

j k l m

nopq

rs

a t

uv

a b c d e

f

gh

ijk l

mn o

pqr

s t u v

a

b

c

d

e f

g

h ij

k

l m n o

p

q

r st

u

v

R1

R2

R3

i) Input Structures

ab c d e f g h i j k l mn o p q r s t u v

ii) Merged Base-Pairs

aeg

p uk

csn

j

qtb

d h m

f l o

vi

r

iii) Compatibility Graph

∅

v o

o ll ff r

l i

j qq t

t bb dd hh m

k pk p g

g n g p uu g e

u e a e ss c

iv) Tree Decomposition

RNARedPrint

Partition FunctionStochastic Backtrack

GACUUUAGCU

UGUUUGAAGU

UA

GACUUGGGAC

UUUUAGGUGU

CU

UUAGAUUCCA

GGGGCUUGU

AGG

GGUUUAGGGC

CUUUAGGUAU

CU

AUUGUGACGG

UCGUGACUAG

UU

. . .

0

0

0

−5

−5

−5

kcal.mol−1

kcal.mol−1

kcal.mol−1

E1

E2

E3GC%

0 50 100

v) Weight Optimization (Adaptive Sampling)

Weightsupd

ate

GCCGCGGUAGCUACAGCCGGCUUUGGGGUUGGGUAGACUCCGGUGCUGCAGCGGCUGUGGCUGGCCGGUUCUGGUUUGCUUAGGGCUACGACGGCGGUGCCGGCAUUUGC. . .

0kcal.mol−1

E1E2E3

GC%

0 50 100

vi) Final Designs

FPT algorithm for counting based on tree decompositionMultidimensional Boltzmann sampling to control energies, GC. . .

o

1

2

3

4

5

6

7

8

9

1 0

E 1 0 _ 1

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

E 2 3 _ 1

E 2 3 _ 2

E 2 3 _ 5

E 2 3 _ 6

E 2 3 _ 7

E 2 3 _ 8

E 2 3 _ 9

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1 4 2 4 3

4 4

4 5

4 6

4 7

4 8

4 9

5 0

o

120

46

68

98

115

128

148

171

201

218

240

258

282

312

330

350

363

382

397

419

UUCCUAAU

UAGGAA

UA

AC

GU

GA

UA

GU

GU

UG

UA

UC

CG

U

GUGCAUACG

AU

GC

AU

AGUG

CCCU

CCACU

G

AGGG

UU

CA

C

GA

UA

UG

AG

GC

CU

CG

GU

GA

A

UA

UC

C G U

GUGUACACG

G U A C A U A G U GCCCUCCACU

G A G G G UG

UG

U

AC

GC

UG

UA

AG

GC

CU

UA

CG

AC

AC

A

UG

CG

CGU

GU

GC

AU

AC

G

AUGCAUAGUG

CCCU

CCACU

G

AGGG

UG

UG

UA

UA

UC

AU

GC

GG

GU

UA

UA

CU

AG

UA

UA

AC

CC

GU

UA

UA

CA

CA

AU

GA

GG

GA

UC

CC

CCU

C C U C G G GGAGG

UC

AUAU

AA

U

AAC

G

AUA

GUUAU

UAA

AU

C

A

UUU

UG

CG

U

UUU

UAUAAAACUU

A

CU

AU

AU U U A U

UG

A A UC

A U U U

CA

G G

AGUG

GUCAA

UU U A A A U A A

UC

G

A UA

UUUGU

U AAA

UC

A

U

UG

GA

AU

A A A

G G UG

C

UA

AA

AU

U

UUC

GAAU

UA

UACCCA

U:A (405)-413->U:A (405)-413->

G

C

G

G

A

U

UUAGCUC

AGU

U

G

GG A

G A G C

G

C

C

A

G

AC

U

GA A

G

A

P

C

U

G

G

AG G

U

C

C U G U G

U PC

G

AUC

CACAG

A

A

U

U

C

G

C

A

CC

A

1

10

20

30

40

50

60

70

76

Conclusions

RNA is cool!

RNA design is one of the current challenge of RNA bioinformatics withfar-reaching consequences for drug design, synthetic biology. . .

Practical use-cases require expressive and modular constraints

Future methods: kinetics, interactions,multiple structures,pseudoknots. . .

RNA inverse folding is the combinatorial core of design.It remains largely unsolved, and opens new lines of research in Comp.Sci.

o

1

2

3

4

5

6

7

8

9

1 0

E 1 0 _ 1

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

E 2 3 _ 1

E 2 3 _ 2

E 2 3 _ 5

E 2 3 _ 6

E 2 3 _ 7

E 2 3 _ 8

E 2 3 _ 9

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1 4 2 4 3

4 4

4 5

4 6

4 7

4 8

4 9

5 0

o

120

46

68

98

115

128

148

171

201

218

240

258

282

312

330

350

363

382

397

419

UUCCUAAU

UAGGAA

UA

AC

GU

GA

UA

GU

GU

UG

UA

UC

CG

U

GUGCAUACG

AU

GC

AU

AGUG

CCCU

CCACU

G

AGGG

UU

CA

C

GA

UA

UG

AG

GC

CU

CG

GU

GA

A

UA

UC

C G U

GUGUACACG


G A G G G UG

UG

U

AC

GC

UG

UA

AG

GC

CU

UA

CG

AC

AC

A

UG

CG

CGU

GU

GC

AU

AC

G

AUGCAUAGUG

CCCU

CCACU

G

AGGG

UG

UG

UA

UA

UC

AU

GC

GG

GU

UA

UA

CU

AG

UA

UA

AC

CC

GU

UA

UA

CA

CA

AU

GA

GG

GA

UC

CC

CCU

C C U C G G GGAGG

UC

AUAU

AA

U

AAC

G

AUA

GUUAU

UAA

AU

C

A

UUU

UG

CG

U

UUU

UAUAAAACUU

A

CU

AU

AU U U A U

UG

A A UC

A U U U

CA

G G

AGUG

GUCAA

UU U A A A U A A

UC

G

A UA

UUUGU

U AAA

UC

A

U

UG

GA

AU

A A A

G G UG

C

UA

AA

AU

U

UUC

GAAU

UA

UACCCA

U:A (405)-413->U:A (405)-413->

G

C

G

G

A

U

UUAGCUC

AGU

U

G

GG A

G A G C

G

C

C

A

G

AC

U

GA A

G

A

P

C

U

G

G

AG G

U

C

C U G U G

U PC

G

AUC

CACAG

A

A

U

U

C

G

C

A

CC

A

1

10

20

30

40

50

60

70

76

Collaborators

University McGillVladimir ReinharzJérôme Waldispühl

MITBonnie BergerSrinivas DevadasAlex LevinMieszko LisCharles O’Donnell

LRI – Univ. Paris SudAlain DeniseVincent Le Gallic

Wuhan UniversityYi ZhangYu Zhou

LIGM – Marne la ValléeStéphane Vialette

LIX – Ecole PolytechniqueAlice HéliouMireille Regnier

Simon Fraser UniversityJozef HalesJan Manuch (UBC)Ladislav Stacho

Cédric ChauveJulien Courtiel

TBI ViennaRonnie LorenzAndrea Tanzer

o

1

2

3

4

5

6

7

8

9

1 0

E 1 0 _ 1

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

E 2 3 _ 1

E 2 3 _ 2

E 2 3 _ 5

E 2 3 _ 6

E 2 3 _ 7

E 2 3 _ 8

E 2 3 _ 9

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1 4 2 4 3

4 4

4 5

4 6

4 7

4 8

4 9

5 0

o

120

46

68

98

115

128

148

171

201

218

240

258

282

312

330

350

363

382

397

419

UUCCUAAU

UAGGAA

UA

AC

GU

GA

UA

GU

GU

UG

UA

UC

CG

U

GUGCAUACG

AU

GC

AU

AGUG

CCCU

CCACU

G

AGGG

UU

CA

C

GA

UA

UG

AG

GC

CU

CG

GU

GA

A

UA

UC

C G U

GUGUACACG


G A G G G UG

UG

U

AC

GC

UG

UA

AG

GC

CU

UA

CG

AC

AC

A

UG

CG

CGU

GU

GC

AU

AC

G

AUGCAUAGUG

CCCU

CCACU

G

AGGG

UG

UG

UA

UA

UC

AU

GC

GG

GU

UA

UA

CU

AG

UA

UA

AC

CC

GU

UA

UA

CA

CA

AU

GA

GG

GA

UC

CC

CCU

C C U C G G GGAGG

UC

AUAU

AA

U

AAC

G

AUA

GUUAU

UAA

AU

C

A

UUU

UG

CG

U

UUU

UAUAAAACUU

A

CU

AU

AU U U A U

UG

A A UC

A U U U

CA

G G

AGUG

GUCAA

UU U A A A U A A

UC

G

A UA

UUUGU

U AAA

UC

A

U

UG

GA

AU

A A A

G G UG

C

UA

AA

AU

U

UUC

GAAU

UA

UACCCA

U:A (405)-413->U:A (405)-413->

G

C

G

G

A

U

UUAGCUC

AGU

U

G

GG A

G A G C

G

C

C

A

G

AC

U

GA A

G

A

P

C

U

G

G

AG G

U

C

C U G U G

U PC

G

AUC

CACAG

A

A

U

U

C

G

C

A

CC

A

1

10

20

30

40

50

60

70

76

Thanks!

Poster submission & Registration open soon. . .(+ ISCB travel fellowships for students)

References I

R. Nussinov and A.B. Jacobson.

Fast algorithm for predicting the secondary structure of single-stranded RNA.Proc Natl Acad Sci U S A, 77:6903–13, 1980.

(positive) design of ribonucleic...

Documents