(positive) design of ribonucleic...
TRANSCRIPT
(Positive) Design of RiboNucleic Acids
Yann Ponty?,•,†
? Centre National de la Recherche Scientifique (CNRS)
• LIX, Ecole Polytechnique
† AMIBio team, Inria Saclay
http://goo.gl/mejsFh
PhD in Computer Science, Université Paris-Sud (France)
CNRS research scientist
Faculty at LIX, Computer Science department of Ecole Polytechnique
Group leader – AMIBio team (Ecole Polytechnique and Inria Saclay)
Postdoc experience in RNA Computational Biology (Boston, Paris)and Discrete Mathematics (Paris)
Extended sabbatical at Simon Fraser University (Vancouver, Canada)
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
Pol
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T APol
A
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T APol
A U
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T APol
A U G
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T APol
A U G G U U A C C C A U
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Ribosome
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Ribosome
Met
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Ribosome
ValMet
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Ribosome
ThrVal
Met
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Ribosome
HisThr
ValMet
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Met Val Thr His Ile Leu His Asn
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Met Val Thr His Ile Leu His Asn
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Met Val Thr His Ile Leu His Asn
Fundamental dogma of molecular biology
RNAs
{A,C,G,U}?
DNA
{A,C,G,T}?
Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸
20+ Amino acids
THE CODE(genes)
THE MACHINE(enzymes)
MEH. . .
A T G G T T A C C C A T
T A C C A A T G G G T A
A U G G U U A C C C A U
Met Val Thr His Ile Leu His Asn
Fundamental dogma of molecular biology
DNA
RNA
Proteins
Transcription
Translation
Regulation
Transfer
Participates
Maturation
Car
rier
Synthesis
RNA functionsMessenger
Translation
Regulation
Enzyme
Catalytic
. . .
0
500
1000
1500
2000
2500
3000
#RNA Functional Families (RFAM DB)
Fundamental dogma of molecular biology (v2.0)
DNA
RNA
Proteins
Transcription
TranslationRegulation
Transfer
Participates
Maturation
Car
rier
Synthesis
RNA functionsMessenger
Translation
Regulation
Enzyme
Catalytic
. . .
0
500
1000
1500
2000
2500
3000
#RNA Functional Families (RFAM DB)
Fundamental dogma of molecular biology
DNA
RNA
Proteins
Transcription
TranslationRegulation
Transfer
Participates
Maturation
Car
rier
Synthesis
RNA functionsMessenger
Translation
Regulation
Enzyme
Catalytic
. . .
0
500
1000
1500
2000
2500
3000
#RNA Functional Families (RFAM DB)
RNA world: Resolving the chicken vs egg paradox at the origin of life. . .
Gene Enzyme
encodes
replicates
RNA
encodes
replicates
A gene big enough to specify an enzyme would be too big to replicate accuratelywithout the aid of an enzyme of the very kind that it is trying to specify. So thesystem apparently cannot get started.[· · · ] This is the RNA World. To see how plausible it is, we need to look at whyproteins are good at being enzymes but bad at being replicators; at why DNA isgood at replicating but bad at being an enzyme; and finally why RNA might justbe good enough at both roles to break out of the Catch-22.R. Dawkins. The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution
RNA world: Resolving the chicken vs egg paradox at the origin of life. . .
Gene Enzyme
encodes
replicates
RNA
encodes
replicates
A gene big enough to specify an enzyme would be too big to replicate accuratelywithout the aid of an enzyme of the very kind that it is trying to specify. So thesystem apparently cannot get started.[· · · ] This is the RNA World. To see how plausible it is, we need to look at whyproteins are good at being enzymes but bad at being replicators; at why DNA isgood at replicating but bad at being an enzyme; and finally why RNA might justbe good enough at both roles to break out of the Catch-22.R. Dawkins. The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution
RNA folding
G/C
Canonical base-pairs
Watson/C
rick base-pairsW
obble base-pair
U/A
U/G
RNA are single-stranded andfold on themselves, establishingcomplex 3D structures that areessential to their function(s).
RNA structures are stabilized bybase-pairs, each mediated byhydrogen bonds.
RNA Design
RNA = Linear Polymer = Sequence in {A,C,G,U}?
UUAGGCGGCCACAGC
GGUGGGGUUGCCUCC
CGUACCCAUCCCGAA
CACGGAAGAUAAGCC
CACCAGCGUUCCGGG
GAGUACUGGAGUGCG
CGAGCCUCUGGGAAA
CCCGGUUCGCCGCCA
CC
U
U
A GG
CG
GC
C
A
C
A
G
C
G
G
U
G
G
GG
U
U
G
C
C
U
C
C
C
G
UA
C
C
C
A
U
CC
C
GA
A
C
A
C
G
GAA
G
AU
A
A
G
C
C
C
A
C
CA
G
CG
U
U
CC
GG
GG A
G
U
A
C U
G
GA
GU
G
CG
C
GA
G
C
CU
CU
G
GG
A
A
A
CC
CG
GUUC
GC
CG
CC
A
CC
1
10
20
30
40
50
60
70
80
90
100
110
120
122
Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)
Evolution of RNAs
Homologous genes = Functionally equivalent, within or across organismsUsually well-captured by sequence similarity in proteins, binding sites. . .
Problem: Many classes of non-(protein) coding RNAs (ncRNAs) poorlyconserved at the sequence level but adopt a conserved structure!
RFAM Bacterial RNAse-P class B Alignment
RNA Design
RNA = Linear Polymer = Sequence in {A,C,G,U}?
UUAGGCGGCCACAGC
GGUGGGGUUGCCUCC
CGUACCCAUCCCGAA
CACGGAAGAUAAGCC
CACCAGCGUUCCGGG
GAGUACUGGAGUGCG
CGAGCCUCUGGGAAA
CCCGGUUCGCCGCCA
CC
U
U
A GG
CG
GC
C
A
C
A
G
C
G
G
U
G
G
GG
U
U
G
C
C
U
C
C
C
G
UA
C
C
C
A
U
CC
C
GA
A
C
A
C
G
GAA
G
AU
A
A
G
C
C
C
A
C
CA
G
CG
U
U
CC
GG
GG A
G
U
A
C U
G
GA
GU
G
CG
C
GA
G
C
CU
CU
G
GG
A
A
A
CC
CG
GUUC
GC
CG
CC
A
CC
1
10
20
30
40
50
60
70
80
90
100
110
120
122
Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)
Structure Prediction
RNA Design
RNA = Linear Polymer = Sequence in {A,C,G,U}?
UUAGGCGGCCACAGC
GGUGGGGUUGCCUCC
CGUACCCAUCCCGAA
CACGGAAGAUAAGCC
CACCAGCGUUCCGGG
GAGUACUGGAGUGCG
CGAGCCUCUGGGAAA
CCCGGUUCGCCGCCA
CC
U
U
A GG
CG
GC
C
A
C
A
G
C
G
G
U
G
G
GG
U
U
G
C
C
U
C
C
C
G
UA
C
C
C
A
U
CC
C
GA
A
C
A
C
G
GAA
G
AU
A
A
G
C
C
C
A
C
CA
G
CG
U
U
CC
GG
GG A
G
U
A
C U
G
GA
GU
G
CG
C
GA
G
C
CU
CU
G
GG
A
A
A
CC
CG
GUUC
GC
CG
CC
A
CC
1
10
20
30
40
50
60
70
80
90
100
110
120
122
Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)
Structure Prediction
RNA Design
Why we design RNAs
To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality
To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure
To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces
To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments
To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters
To perform controlled experiments
Why we design RNAs
To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality
To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure
To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces
To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments
To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters
To perform controlled experiments
Why we design RNAs
To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality
To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure
To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces
To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments
To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters
To perform controlled experiments
Why we design RNAs
To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality
To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure
To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces
To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments
To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters
To perform controlled experiments
Why we design RNAs
To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality
To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure
To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces
To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments
To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters
To perform controlled experiments
Why we design RNAs
To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality
To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure
To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces
To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments
To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters
To perform controlled experiments
Controlled experiments through RNA design
Motivation: Quantifying the impactof structure S on efficacy of a singleExon Splicing Enhancers (ESE):
Presence of ESE motif E ;
Different structures S1, S2. . . ;
Avoid library of (∼1500!)documented ESEs motifs.
Objectives. Design RNA which:1 Folds into a prescribed
structure;2 Features/avoids motifs.3 Control GC%, Boltz. prob.. . . .
Structural context of ESE motif intranscript was shown to affect itsfunctionality. [Liu et al, FEBS Lett. 2010]
Design objectives
Positive structural designOptimize affinity of designs towards target structure(s)Examples: Most stable sequence for given fold. . .
Negative structural designLimit affinity of designs towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .
Additional constraints:
Forbid motif list to appear anywhere in designForce motif list to appear each at least onceLimit available alternatives at certain positionsControl overall composition (GC-content)
Outline
I. Single Structure Design (IncaRNAtion)
II. Constrained Design using Formal Languages
III. Multiple Structures
I. Inverse Folding
Designing a given structure
RNA sequence and structure(s)
RNA = Linear Polymer = Sequence in {A,C,G,U}?
UUAGGCGGCCACAGC
GGUGGGGUUGCCUCC
CGUACCCAUCCCGAA
CACGGAAGAUAAGCC
CACCAGCGUUCCGGG
GAGUACUGGAGUGCG
CGAGCCUCUGGGAAA
CCCGGUUCGCCGCCA
CC
U
U
A GG
CG
GC
C
A
C
A
G
C
G
G
U
G
G
GG
U
U
G
C
C
U
C
C
C
G
UA
C
C
C
A
U
CC
C
GA
A
C
A
C
G
GAA
G
AU
A
A
G
C
C
C
A
C
CA
G
CG
U
U
CC
GG
GG A
G
U
A
C U
G
GA
GU
G
CG
C
GA
G
C
CU
CU
G
GG
A
A
A
CC
CG
GUUC
GC
CG
CC
A
CC
1
10
20
30
40
50
60
70
80
90
100
110
120
122
Primary Structure Secondary Structure Tertiary Structure5s rRNA (PDBID: 1K73:B)
MFE folding prediction: O(n3)
Crossing interactions
Excluded from the secondary structure:
Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].
Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)
(Pseudo?)knots: Crossing sets of nested stable base-pairs
G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G
1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229
Group I Ribozyme (PDBID: 1Y0Q:A)
Crossing interactions
Excluded from the secondary structure:
Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].
Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)
(Pseudo?)knots: Crossing sets of nested stable base-pairs
G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G
1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229
Group I Ribozyme (PDBID: 1Y0Q:A)
Crossing interactions
Excluded from the secondary structure:
Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].
Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)
(Pseudo?)knots: Crossing sets of nested stable base-pairs
G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G
1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229
Group I Ribozyme (PDBID: 1Y0Q:A)
Crossing interactions do exist!
Example: Group II Intron (PDB ID: 3IGI)
But are hard to predict[Lyngsoe-ICALP’04]
[Sheikh Backofen Ponty, CPM’12]
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T →∞
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T →∞
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T = 0h
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T = 1h
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T = 2h
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T = 5h
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T = 10h
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T →∞
mRNA half-life: ∼7h(Mouse [Sharova2009])
Thermodynamics vs Kinetics
Paradigms for RNA structure prediction
1978–1990s Most probable structure = Minimal Free-Energy (MFE)
1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)
2010s–???? Embracing the kinetics of RNA folding
T = 10h
mRNA half-life: ∼7h(Mouse [Sharova2009])
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
ES = 2 ·∆
U
G
+ 4 ·∆
G
C
+ 2 ·∆
C
G
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
ES = ∆
(C
G
G
C
)+ ∆
(G
C
G
C
)+ ∆
(U
G
G
C
)+ ∆
(U
G
G
C
)+ ∆
(U
G
G
C
)
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
ES = ∆
(C
G
G
C
)+ ∆
(G
C
G
C
)+ ∆
(U
G
G
C
)+ ∆
(U
G
G
C
)+ ∆
(U
G
G
C
)+ ∆
(U
U
ACA
G
)+ ∆
(G
A
UG
A
G
CC
)+ ∆
(C A
U
GUG
)
Problem statement
C G G A U G U U A C A G C G A G C A U G U G C C C G
1 10 20 26
C
G
G
A
U
G
U
U
ACA
G
C
GA
GC A
U
GUG
CC
C
G
1
10
20
26
RNA structure S: Non-crossing base-pairs for positions in sequence w
Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )
Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S
Definition (MFE-PREDICT(E) problem)
Input: RNA sequence w ∈ {A,C,G,U}∗Output: Secondary struct. S∗ with Minimal Free-Energy (MFE) Ew (S∗)
Problem solved exactly in O(n3) time.[Nussinov Jacobson, PNAS 1980] [Zuker Stiegler, NAR 1981]. . . .
Dynamic programming (DP) for RNA folding
Theorem ([Nussinov and Jacobson(1980)])
Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory
i j=
i i+1 j+
i k j
≥ θ
Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))
Ni,j : Max #base-pairs over interval [i, j]
Ni,t = 0, ∀t ∈ [i, i + θ]
Ni,j = min
Ni+1,j {i unpaired}j
mink=i+θ+1
Ei,k + Ni+1,k−1 + Nk+1,j {i paired to k }
Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]
Comparative folding [Sankoff1985]
Equilibrium base-pairing probabilities [McCaskill1990]
Moments of additive features [Miklos2005,Ponty2011]
∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]
Basic crossing structures [Rivas1999]. . .
Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]
Moments of additive features [Miklos2005,Ponty2011]
Maximum expected accuracy structure [Do2006]
Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]
Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space
Objective function additive with respect to DP scheme
Dynamic programming (DP) for RNA folding
Theorem ([Nussinov and Jacobson(1980)])
Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory
i j=
i i+1 j+
i k j
≥ θ
Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))
Ci,j : Number of secondary structures compatible with interval [i, j]
Ci,t = 1, ∀t ∈ [i, i + θ]
Ci,j =∑∑∑{
Ci+1,j {i unpaired}∑∑∑jk=i+θ+11comp.(i,k)×Ci+1,k−1×Ck+1,j {i paired to k }
Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]
Comparative folding [Sankoff1985]
Equilibrium base-pairing probabilities [McCaskill1990]
Moments of additive features [Miklos2005,Ponty2011]
∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]
Basic crossing structures [Rivas1999]. . .
Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]
Moments of additive features [Miklos2005,Ponty2011]
Maximum expected accuracy structure [Do2006]
Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]
Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space
Objective function additive with respect to DP scheme
Dynamic programming (DP) for RNA folding
Theorem ([Nussinov and Jacobson(1980)])
Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory
i j=
i i+1 j+
i k j
≥ θ
Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))
Zi,j =∑
S comp.with w[i,j]
e−Ew (S)
RT = Partition function for compatible structs within [i, j]
Zi,t = 1, ∀t ∈ [i, i + θ]
Zi,j =∑∑∑{
Zi+1,j {i unpaired}∑jk=i+θ+1e
−Ei,kRT ×Zi+1,k−1×Zk+1,j {i paired to k }
Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]
Comparative folding [Sankoff1985]
Equilibrium base-pairing probabilities [McCaskill1990]
Moments of additive features [Miklos2005,Ponty2011]
∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]
Basic crossing structures [Rivas1999]. . .
Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]
Moments of additive features [Miklos2005,Ponty2011]
Maximum expected accuracy structure [Do2006]
Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]
Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space
Objective function additive with respect to DP scheme
Dynamic programming (DP) for RNA folding
Theorem ([Nussinov and Jacobson(1980)])
Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory
i j=
i i+1 j+
i k j
≥ θ
Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))
Zi,j =∑
S comp.with w[i,j]
e−Ew (S)
RT = Partition function for compatible structs within [i, j]
Zi,t = 1, ∀t ∈ [i, i + θ]
Zi,j =∑∑∑{
Zi+1,j {i unpaired}∑jk=i+θ+1e
−Ei,kRT ×Zi+1,k−1×Zk+1,j {i paired to k }
Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]
Comparative folding [Sankoff1985]
Equilibrium base-pairing probabilities [McCaskill1990]
Moments of additive features [Miklos2005,Ponty2011]
∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]
Basic crossing structures [Rivas1999]. . .
Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]
Moments of additive features [Miklos2005,Ponty2011]
Maximum expected accuracy structure [Do2006]
Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]
Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space
Objective function additive with respect to DP scheme
RNA inverse folding
RNA = Linear Polymer = Sequence in {A,C,G,U}?
UUAGGCGGCCACAGC
GGUGGGGUUGCCUCC
CGUACCCAUCCCGAA
CACGGAAGAUAAGCC
CACCAGCGUUCCGGG
GAGUACUGGAGUGCG
CGAGCCUCUGGGAAA
CCCGGUUCGCCGCCA
CC
U
U
A GG
CG
GC
C
A
C
A
G
C
G
G
U
G
G
GG
U
U
G
C
C
U
C
C
C
G
UA
C
C
C
A
U
CC
C
GA
A
C
A
C
G
GAA
G
AU
A
A
G
C
C
C
A
C
CA
G
CG
U
U
CC
GG
GG A
G
U
A
C U
G
GA
GU
G
CG
C
GA
G
C
CU
CU
G
GG
A
A
A
CC
CG
GUUC
GC
CG
CC
A
CC
1
10
20
30
40
50
60
70
80
90
100
110
120
122
Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)
MFE folding prediction: Θ(n3)
Inverse folding: NP-hard?
RNA Inverse Folding
Definition (INVERSE-FOLDING(E) problem)
Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:
∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆
or ∅ if no such sequence exists.
Difficult problem: No obvious DP decomposition
Existing algorithms: Heuristics or Exponential-time
Complexity of problem unknown (despite [Schnall Levin et al, ICML’08])Reason: Non locality, no theoretical frameworks, too many parameters. . .
RNA Inverse Folding
Definition (INVERSE-FOLDING(E) problem)
Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:
∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆
or ∅ if no such sequence exists.
Example:
1
10
20
26
RNA Inverse Folding
Definition (INVERSE-FOLDING(E) problem)
Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:
∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆
or ∅ if no such sequence exists.
Example:
1
10
20
26
= 1 10
13
1 10
13
RNA Inverse Folding
Definition (INVERSE-FOLDING(E) problem)
Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:
∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆
or ∅ if no such sequence exists.
Example:
1
10
20
26
= 1 10
13
1 10
13
AAGAGUCGCUCUCfolds<-----1 10
13
RNA Inverse Folding
Definition (INVERSE-FOLDING(E) problem)
Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:
∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆
or ∅ if no such sequence exists.
Example:AAGAGUCGCUCUCAAGAGUCGCUCUC
1
10
20
26
1
10
20
26
Folds
Existing approaches for negative design
Based on local search. . .
RNAInverse - TBI Vienna
Info-RNA - Backofen@Freiburg
RNA-SSD - Condon@UBC
NUPack - Pierce@Caltech
. . . bio-inspired algorithms. . .
RNAFBinv - Barash@Ben Gurion
FRNAKenstein - Hein@Oxford
AntaRNA - Backofen@Freiburg
ERD - Ganjtabesh@Tehran
. . . exact approaches. . .
RNAIFold - Clote@Boston College
CO4 - Will@Leipzig
Typical issues:Naive initialization strategiesPoor coverage of sequence space:
Local search remain confined near initial sequenceGC-rich produced sequences
⇒ Global sampling [Levin et al, NAR 12]
Existing approaches for negative design
Based on local search. . .
RNAInverse - TBI Vienna
Info-RNA - Backofen@Freiburg
RNA-SSD - Condon@UBC
NUPack - Pierce@Caltech
. . . bio-inspired algorithms. . .
RNAFBinv - Barash@Ben Gurion
FRNAKenstein - Hein@Oxford
AntaRNA - Backofen@Freiburg
ERD - Ganjtabesh@Tehran
. . . exact approaches. . .
RNAIFold - Clote@Boston College
CO4 - Will@Leipzig
Typical issues:Naive initialization strategiesPoor coverage of sequence space:
Local search remain confined near initial sequenceGC-rich produced sequences
⇒ Global sampling [Levin et al, NAR 12]
The case for a control of GC-content
High GC-content suspected to induce kinetic traps
Global sampling [Levin et al, NAR 12]
Target structure SBoltzmann distribution based on affinity towards SRandom generation from Boltzmann DistributionFold sampled sequences and compare to target
Boltzmann factor:
Bw (S) := e−Ew (S)
RT
Pseudo-Partition Function:
Z(S) =∑
w∈Σ∗
Bw (S)
Boltzmann probability:
p(s) :=Bw (S)
Z
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
IncaRNAtion [Reinharz et al, Bioinformatics 2013]
[Position i unpaired]
[Position i paired with k ]
[Paired ends + Stacking pairs]
si sja b
i j
Explore sequence spaceStructure fixed
(Only 1 case applies)
a′a bi + 1 j
∑a′
a′ b′a bi + 1 jkk − 1 k + 1
∑a′,b′
a′ b′a bi + 1 j − 1
∑a′,b′
IncaRNAtion [Reinharz et al, Bioinformatics 2013]
[Position i unpaired]
[Position i paired with k ]
[Paired ends + Stacking pairs]
si sja b
i j
Explore sequence spaceStructure fixed
(Only 1 case applies)
a′a bi + 1 j
∑a′
a′ b′a bi + 1 jkk − 1 k + 1
∑a′,b′
a′ b′a bi + 1 j − 1
∑a′,b′
IncaRNAtion [Reinharz et al, Bioinformatics 2013]
[Position i unpaired]
[Position i paired with k ]
[Paired ends + Stacking pairs]
si sja b
i j
Explore sequence spaceStructure fixed
(Only 1 case applies)
a′a bi + 1 j
∑a′
a′ b′a bi + 1 jkk − 1 k + 1
∑a′,b′
a′ b′a bi + 1 j − 1
∑a′,b′
IncaRNAtion [Reinharz et al, Bioinformatics 2013]
[Position i unpaired]
[Position i paired with k ]
[Paired ends + Stacking pairs]
si sja b
i j
Explore sequence spaceStructure fixed
(Only 1 case applies)
a′a bi + 1 j
∑a′
a′ b′a bi + 1 jkk − 1 k + 1
∑a′,b′
a′ b′a bi + 1 j − 1
∑a′,b′
IncaRNAtion [Reinharz et al, Bioinformatics 2013]
[Position i unpaired]
[Position i paired with k ]
[Paired ends + Stacking pairs]
si sja b
i j
Explore sequence spaceStructure fixed
(Only 1 case applies)
a′a bi + 1 j
∑a′
a′ b′a bi + 1 jkk − 1 k + 1
∑a′,b′
a′ b′a bi + 1 j − 1
∑a′,b′
"Global" Stochastic Backtrack
Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14
Sequence:
nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14
− 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
"Global" Stochastic Backtrack
Z = weight of nt1
Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14
Sequence: nt1
nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14
−nt1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
"Global" Stochastic Backtrack
Z = weight of nt1
Z = weight of nt2 ∧ nt17
Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14
Sequence: nt1 nt2 nt17
nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14
−nt2 nt17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
"Global" Stochastic Backtrack
Z = weight of nt1Z = weight of nt2 ∧ nt17
Z = weight of nt3 ∧ nt16
Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14
Sequence: nt1 nt2 nt17nt3 nt16
nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14
−nt3 nt16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
"Global" Stochastic Backtrack
Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16
Z = weight of nt4 ∧ nt8
Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14
Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8
nt5 nt6 nt7 nt9 nt15nt10 nt14
−nt4 nt8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
"Global" Stochastic Backtrack
Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8
Z = weight of nt9 ∧ nt15
Z = weight of nt10 ∧ nt14
Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15
nt10 nt14
−nt9 nt15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
"Global" Stochastic Backtrack
Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15
Z = weight of nt10 ∧ nt14
Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14
−nt10 nt14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −
GC-content bias
0.0 0.2 0.4 0.6 0.8 1.0GC content
0
200
400
600
800
1000
Quanti
ty
Target 15% GC
Average GC content is 64%
RFAM average GC content48.8%
Weighted DP Recursions
[Position i unpaired]
[Position i paired with k ]
[Paired ends + Stacking pairs]
si sja b
i j
a′a bi + 1 j
∑a′
x#GC(a′)
a′ b′a bi + 1 jkk − 1 k + 1
∑a′,b′
x#GC(a′b′)
a′ b′a bi + 1 j − 1
∑a′,b′
x#GC(a′b′)
Incarnation NT distribution: Bissection scheme
..
0.0
.
0.2
.
0.4
.
0.6
.
0.8
.
1.0
.
GC content.
0
.
200
.
400
.
600
.
800
.
1000
.
Quan
tity
.
Target 15% GC
.
x#GC = 0.2500
.
Tot round 1 is 0
[Waldispühl and Ponty, RECOMB, 2011]
Incarnation NT distribution: Bissection scheme
..
0.0
.
0.2
.
0.4
.
0.6
.
0.8
.
1.0
.
GC content.
0
.
200
.
400
.
600
.
800
.
1000
.
Quan
tity
.
Target 15% GC
.
x#GC = 0.2500
.
Tot round 1 is 0
.
x#GC = 0.1250
.
Tot round 2 is 141
[Waldispühl and Ponty, RECOMB, 2011]
Incarnation NT distribution: Bissection scheme
..
0.0
.
0.2
.
0.4
.
0.6
.
0.8
.
1.0
.
GC content.
0
.
200
.
400
.
600
.
800
.
1000
.
Quan
tity
.
Target 15% GC
.
x#GC = 0.2500
.
Tot round 1 is 0
.
x#GC = 0.1250
.
Tot round 2 is 141
.
x#GC = 0.0625
.
Tot round 3 is 547
[Waldispühl and Ponty, RECOMB, 2011]
Incarnation NT distribution: Bissection scheme
..
0.0
.
0.2
.
0.4
.
0.6
.
0.8
.
1.0
.
GC content.
0
.
200
.
400
.
600
.
800
.
1000
.
Quan
tity
.
Target 15% GC
.
x#GC = 0.2500
.
Tot round 1 is 0
.
x#GC = 0.1250
.
Tot round 2 is 141
.
x#GC = 0.0625
.
Tot round 3 is 547
.
x#GC = 0.0938
.
Tot round 4 is 823
[Waldispühl and Ponty, RECOMB, 2011]
Incarnation NT distribution: Bissection scheme
..
0.0
.
0.2
.
0.4
.
0.6
.
0.8
.
1.0
.
GC content.
0
.
200
.
400
.
600
.
800
.
1000
.
Quan
tity
.
Target 15% GC
.
x#GC = 0.2500
.
Tot round 1 is 0
.
x#GC = 0.1250
.
Tot round 2 is 141
.
x#GC = 0.0625
.
Tot round 3 is 547
.
x#GC = 0.0938
.
Tot round 4 is 823
.
x#GC = 0.0781
.
Tot round 5 is 1165
[Waldispühl and Ponty, RECOMB, 2011]
Limits of the approach
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .
Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]
Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]
Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]
Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]
Limits of the approach
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .
Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]
Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]
Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]
Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]
Limits of the approach
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .
Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]
Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]
Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]
Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]
Limits of the approach
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
1
10
20
30
Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .
Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]
Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]
Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]
Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]
Local vs Global vs "Glocal"
Local vs Global vs "Glocal"
Local vs Global vs "Glocal"
The success of glocal strategies
Sampling + Optimize creates highly probable design sequences
II. Constrained design
Avoiding/forcing motifs
Existing approaches for negative design
Based on local search. . .
RNAInverse - TBI Vienna
Info-RNA -Backofen@Freiburg
RNA-SSD - Condon@UBC
NUPack - Pierce@Caltech
. . . bio-inspired algorithms. . .
RNAFBinv - Barash@Ben Gurion
FRNAKenstein - Hein@Oxford
AntaRNA - Backofen@Freiburg
. . . exact approaches. . .
RNAIFold - Clote@Boston College
CO4 - Will@Leipzig
Few algorithms support avoided/mandatory motifs. . .
. . . none guarantees reasonable runtime.
Typical reasons:Deep local minima (Rugged landscape)
Mandatory motifs⇒ Late deadends (Branch and Bound)
Forbidden motifs⇒ Search space disconnection (Local Search)
Existing approaches for negative design
Based on local search. . .
RNAInverse - TBI Vienna
Info-RNA -Backofen@Freiburg
RNA-SSD - Condon@UBC
NUPack - Pierce@Caltech
. . . bio-inspired algorithms. . .
RNAFBinv - Barash@Ben Gurion
FRNAKenstein - Hein@Oxford
AntaRNA - Backofen@Freiburg
. . . exact approaches. . .
RNAIFold - Clote@Boston College
CO4 - Will@Leipzig
Few algorithms support avoided/mandatory motifs. . .
. . . none guarantees reasonable runtime.
Typical reasons:Deep local minima (Rugged landscape)
Mandatory motifs⇒ Late deadends (Branch and Bound)
Forbidden motifs⇒ Search space disconnection (Local Search)
Problem with local approaches: An example
Simplified vocabulary {A,U}
+ Forbidden motifs F = {AU,UA}
UUUUUAAAAA
AAUAA
AUAAA
UAAAA UUUUA
UUUAU
UUAUU
UAUUU
UAAUU
AUUAU
AAUAU
AAUUU
AAAUA
AAAUA
AAUUA
AAAUU
AUUUU
⇒ F may disconnect search space (holds for any move set!)
Problem with local approaches: An example
Simplified vocabulary {A,U} + Forbidden motifs F = {AU,UA}
UUUUUAAAAA
AAUAA
AUAAA
UAAAA UUUUA
UUUAU
UUAUU
UAUUU
UAAUU
AUUAU
AAUAU
AAUUU
AAAUA
AAAUA
AAUUA
AAAUU
AUUUU
⇒ F may disconnect search space (holds for any move set!)
Idea
Use formal language constructs to constrain global sampling
Forced motifsAvoided motifs
→ Regular language LC ∈ Reg
Structure compatibility+ Positional constraints+ Energy Model
→ Weighted Context-Free Lang LS ∈ CFL
Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL
Build weighted context-free grammar G for LC ∩ LS+ Random generation
⇒ Global sampling under constraints
Idea
Use formal language constructs to constrain global sampling
Forced motifsAvoided motifs
→ Regular language LC ∈ Reg
Structure compatibility+ Positional constraints+ Energy Model
→ Weighted Context-Free Lang LS ∈ CFL
Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL
Build weighted context-free grammar G for LC ∩ LS+ Random generation
⇒ Global sampling under constraints
Idea
Use formal language constructs to constrain global sampling
Forced motifsAvoided motifs
→ Regular language LC ∈ Reg
Structure compatibility+ Positional constraints+ Energy Model
→ Weighted Context-Free Lang LS ∈ CFL
Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL
Build weighted context-free grammar G for LC ∩ LS+ Random generation
⇒ Global sampling under constraints
Idea
Use formal language constructs to constrain global sampling
Forced motifsAvoided motifs
→ Regular language LC ∈ Reg
Structure compatibility+ Positional constraints+ Energy Model
→ Weighted Context-Free Lang LS ∈ CFL
Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL
Build weighted context-free grammar G for LC ∩ LS+ Random generation
⇒ Global sampling under constraints
Building the Finite State Automaton
To force multiple words, keeptrack of generated words:
Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata
#States:
O(
2|M| ·(∑
i |fi |+∑
j |mj |))
Example: M = {AGC,GG}
ε
A
AG
AGC
C
G
A
G
GG
G
G
ε
A
AG
AGC
C
G
A
ε
G
G
G
G
ε
A,C,G,U
AM
A{GG} A{AGC}A∅
Building the Finite State Automaton
To force multiple words, keeptrack of generated words:
Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata
#States:
O(
2|M| ·(∑
i |fi |+∑
j |mj |))
Example: M = {AGC,GG}
ε
A
AG
G
A
G
G
ε
A
AG
G
A
ε
G
G
ε
A,C,G,U
C
G
CG
AM
A{GG} A{AGC}
A∅
Building the Finite State Automaton
To force multiple words, keeptrack of generated words:
Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata
#States:
O(
2|M| ·(∑
i |fi |+∑
j |mj |))
Example:M = {AGC,GG};F = {AA}
ε
A
AG
G
A
G
G
ε
A
AG
G
A
ε
G
G
A
A
ε
A
A
C,G,U
C
G
CG
⊥A
A
A
A
Building the grammar
Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position
. ( ( ( . ) ) ( . . ) )
1 5 10 12
Building the grammar
Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position
. ( ( ( . ) ) ( . . ) )
1 5 10 12
S1
S2
S3S4
S5
S8
S9S10
Building the grammar
Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position
S1 → .S2 S2 → (S3 ) S3 → (S4 )S8 S4 → (S5 )
S5 → . S8 → (S9 ) S9 → .S10 S10 → .
Building the grammar
Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position
V1 → AV2 | CV2 | GV2 | UV2
V2 → AV3U | CV3G | GV3C | GV3U | UV3A | UV3G
V3 → AV4UV8 | CV4GV8 | GV4CV8 | GV4UV8| UV4AV8 | UV4GV8
V4 → AV5U | CV5G | GV5C | GV5U | UV5A | UV5G
V5 → A | C | G | UV8 → AV9U | CV9G | GV9C | GV9U | UV9A | UV9G
V9 → AV10 | CV10 | GV10 | UV10
V10 → A | C | G | U
Building the grammar
Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position
V1 → AV2 | CV2 | GV2 | UV2
V2 → AV3U |�����CV3G |�����GV3C |�����GV3U | UV3A |�����UV3G
V3 → AV4UV8 |������CV4GV8 |������GV4CV8 |�������GV4UV8| UV4AV8 |�������UV4GV8
V4 → AV5U |�����CV5G |�����GV5C |�����GV5U | UV5A |�����UV5G
V5 → A | C | G | UV8 → AV9U |�����CV9G |�����GV9C |�����GV9U | UV9A |�����UV9G
V9 → AV10 | CV10 | GV10 | UV10
V10 → A | C | G | U
Random generation
Combine CFG and aut. → CFG (Multiplying #Rules by |Q|3)
GenRGenS [Ponty Termier Denise, Bioinformatics 2006]:Precomputes #words for each non-terminalRandom Generation w.r.t. weighted distribution
Energy models:Uniform distributionNussinov energy modelStacking-pairs model (Turner 2004)
Based on refined, yet similar, grammar
Overall complexity: |S| · 23|M| ·(∑
i |fi |+∑
j |mj |)3
Linear on |S|Exponential on |M|, but NP-Hard problem
III. Positive design for multiplestructures
Motivation: Kinetics and riboswitches
∆G
- Local Minimum (LM)
Design objectives
Positive structural designOptimize affinity of designed sequences towards target structureOr simply ensure their compatibility with one or several structuresExamples: Most stable sequence for given fold. . .
Negative structural designLimit affinity of designed sequences towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .
Additional constraints:
Forbid motif list to appear anywhere in design
Force motif list to appear each at least onceLimit available alternatives at certain positions
Control overall composition (GC-content)
Design objectives
Positive structural designOptimize affinity of designed sequences towards target structureOr simply ensure their compatibility with one or several structuresExamples: Most stable sequence for given fold. . .
Negative structural designLimit affinity of designed sequences towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .
Additional constraints:
Forbid motif list to appear anywhere in design
Force motif list to appear each at least onceLimit available alternatives at certain positions
Control overall composition (GC-content)
Counting compatible RNAs: Watson-Crick + Single structure
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456
Counting compatible RNAs: Watson-Crick + Single structure
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
G A G C U C A G C G C G A A C G C U C U C A
Compatible Sequence
Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456
Counting compatible RNAs: Watson-Crick + Single structure
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
A A A A G A C U G G A C U U G G C C U U U C
Incompatible Sequence
Question: How many Compatible sequences?
Answer: 4#BPs × 4#Unpaired → 268 435 456
Counting compatible RNAs: Watson-Crick + Single structure
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Question: How many Compatible sequences?
Answer: 4#BPs × 4#Unpaired → 268 435 456
Counting compatible RNAs: Watson-Crick + Single structure
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Question: How many Compatible sequences?
Answer: 4#BPs × 4#Unpaired → 268 435 456
Counting compatible RNAs: Watson-Crick + Single structure
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456
Counting compatible RNAs: Watson-Crick + Two structures
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)
4#CCs → 65 536
Counting compatible RNAs: Watson-Crick + Two structures
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)
4#CCs → 65 536
Counting compatible RNAs: Watson-Crick + Two structures
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)
4#CCs → 65 536
Counting compatible RNAs: Watson-Crick + Two structures
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)
4#CCs → 65 536
Counting compatible RNAs: Watson-Crick + Two structures
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)
4#CCs → 65 536
Counting compatible RNAs: Watson-Crick + > 2 structs
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64
Counting compatible RNAs: Watson-Crick + > 2 structs
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64
Counting compatible RNAs: Watson-Crick + > 2 structs
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64
Counting compatible RNAs: Watson-Crick + > 2 structs
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64
Counting compatible RNAs: Watson-Crick + > 2 structs
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64
Counting compatible RNAs: Watson-Crick + > 2 structs
C
U
G
A
Compatible Base Pairs = Only Watson-Crick base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64
Counting compatible RNAs: WC/Wobble + Single struct.
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
Question: How many Compatible sequences?
Answer: 4#Unpaired × 6#BPs → 6 879 707 136
Counting compatible RNAs: WC/Wobble + Single struct.
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
Question: How many Compatible sequences?
Answer: 4#Unpaired × 6#BPs → 6 879 707 136
Counting compatible RNAs: WC/Wobble + Two structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)
#Designs(G) =∏
c∈CC(G)
#Designs(cc)
Counting compatible RNAs: WC/Wobble + Two structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)
#Designs(G) =∏
c∈CC(G)
#Designs(cc)
Counting compatible RNAs: WC/Wobble + Two structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)
#Designs(G) =∏
c∈CC(G)
#Designs(cc)
Counting compatible RNAs: WC/Wobble + Two structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)
#Designs(G) =∏
c∈CC(G)
#Designs(cc)
Counting compatible RNAs: WC/Wobble + Two structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)
#Designs(G) =∏
c∈CC(G)
#Designs(cc)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For paths: A simple DFA generates compatible sequences
εstart
AU
GC
A
U
C
G
G
CG
U
U
A
Remark: A↔ C/G↔ U symmetry
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For paths: A simple DFA generates compatible sequences
εstart • ◦A,C
G,U
◦
•◦
Remark: A↔ C/G↔ U symmetry
p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For paths: A simple DFA generates compatible sequences
εstart • ◦A,C
G,U
◦
•◦
Remark: A↔ C/G↔ U symmetry
m•(n) = m◦(n − 1)
m◦(n) = m◦(n − 1) + m•(n − 1)
= m◦(n − 1) + m◦(n − 2)
= F(n + 2)
p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For paths: A simple DFA generates compatible sequences
εstart • ◦A,C
G,U
◦
•◦
Remark: A↔ C/G↔ U symmetry
m•(n) = m◦(n − 1)
m◦(n) = m◦(n − 1) + m•(n − 1)
= m◦(n − 1) + m◦(n − 2)
= F(n + 2)
p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For paths: A simple DFA generates compatible sequences
εstart • ◦A,C
G,U
◦
•◦
Remark: A↔ C/G↔ U symmetry
m•(n) = m◦(n − 1)
m◦(n) = m◦(n − 1) + m•(n − 1)
= m◦(n − 1) + m◦(n − 2)
= F(n + 2)
(Since m◦(0) = 1 and m◦(1) = 2)
p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For paths: A simple DFA generates compatible sequences
εstart • ◦A,C
G,U
◦
•◦
Remark: A↔ C/G↔ U symmetry
m•(n) = m◦(n − 1)
m◦(n) = m◦(n − 1) + m•(n − 1)
= m◦(n − 1) + m◦(n − 2)
= F(n + 2)
(Since m◦(0) = 1 and m◦(1) = 2)
p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
For cycle: A barely more involved DFA generates compatible sequences
εstart
• ◦1 •1
◦2 •2•2
A,C◦
◦
•
◦G,U
◦
•
◦
Remark: A↔ C/G↔ U symmetry
m◦2 (n) = F(n + 2)
m◦1 (n) = F(n + 1)
(Since m◦1 (0) = 1 and m◦1 (1) = 1)
c(n) := mε(n) = 2 m◦1 (n − 2) + 2 m◦2 (n − 1)
= 2(F(n − 1) + F(n + 1)) = 2F(n) + 4F(n − 1)
Counting compatible designs for paths and cycles
Theorem (#Compatible designs for paths and cycles)
The numbers of compatible designs for paths and cycles of length n are:
p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1
where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.
Theorem (#Compatible designs for general 2-structures graphs)
G: dependency graph associated with 2 RNA structures (max deg=2).The number #Designs(G) of compatible designs for G is given by
#Designs(G) =∏
p∈P(G)
2F|p|+2 ×∏
c∈C(G)
(2F|c| + 4F|c|−1
)where G decomposes into paths P(G) and cycles C(G).
Counting compatible sequences: WC/Wobble + Two structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
aeg
p
u
k
Dependency graph:Cycles + Paths
bd
h j q
t
f l o v c s
i m n r
Question: How many Compatible sequences?
Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)
#Designs(G) =∏
c∈CC(G)
#Designs(cc) = 2 322 432
Counting compatible sequences: Watson-Crick + > 2 structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→
∏cc∈CC(G)
2×#IS(cc)
Counting compatible sequences: Watson-Crick + > 2 structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Non-bipartite→ ∅; Bipartite→∏
cc∈CC(G)
2×#IS(cc)
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
A
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
A
U
U
U
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
A
U
U
U
G
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
A
U
U
U
G C
G
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
A
U
U
U
G C
G
A U
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Bijection between Independent Sets and Valid Designs
C
U
G
AA
CG
U
fed
i
g
h
cba
C
G
G
G
U A
U
C G
Remark: No adjacent black letters in compatible designs
Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)
Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design
⇒ Bijection between Designs?(cc) and IndependentSets(cc).
Valid designs and independent sets
Theorem (#Valid design for bipartite connected dependency graphs)
Let G be a bipartite connected dependency graph, one has:
#Designs(G) = 2×#Designs?(G) = 2×#IS(G)
For a bipartite dependency graph G we get:
#Designs(G) =∏
cc∈CC(G)
2×#IS(cc) = 2|CC(G)| ×#IS(G)
But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]
(+ Any G is a dependency graph)
Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .
Theorem
#Designs is #P-hard.
No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)
Valid designs and independent sets
Theorem (#Valid design for bipartite connected dependency graphs)
Let G be a bipartite connected dependency graph, one has:
#Designs(G) = 2×#Designs?(G) = 2×#IS(G)
For a bipartite dependency graph G we get:
#Designs(G) =∏
cc∈CC(G)
2×#IS(cc) = 2|CC(G)| ×#IS(G)
But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]
(+ Any G is a dependency graph)
Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .
Theorem
#Designs is #P-hard.
No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)
Valid designs and independent sets
Theorem (#Valid design for bipartite connected dependency graphs)
Let G be a bipartite connected dependency graph, one has:
#Designs(G) = 2×#Designs?(G) = 2×#IS(G)
For a bipartite dependency graph G we get:
#Designs(G) =∏
cc∈CC(G)
2×#IS(cc) = 2|CC(G)| ×#IS(G)
But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]
(+ Any G is a dependency graph)
Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .
Theorem
#Designs is #P-hard.
No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)
Valid designs and independent sets
Theorem (#Valid design for bipartite connected dependency graphs)
Let G be a bipartite connected dependency graph, one has:
#Designs(G) = 2×#Designs?(G) = 2×#IS(G)
For a bipartite dependency graph G we get:
#Designs(G) =∏
cc∈CC(G)
2×#IS(cc) = 2|CC(G)| ×#IS(G)
But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]
(+ Any G is a dependency graph)
Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .
Theorem
#Designs is #P-hard.
No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)
Consequences
Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]
For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)
Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]
Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.
Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).
Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].
Conjecture (#BIS hardness of sampling)
Generating comp. sequences (almost) uniformly for general input is #BIS-hard.
Consequences
Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]
For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)
Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]
Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.
Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).
Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].
Conjecture (#BIS hardness of sampling)
Generating comp. sequences (almost) uniformly for general input is #BIS-hard.
Consequences
Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]
For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)
Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]
Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.
Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).
Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].
Conjecture (#BIS hardness of sampling)
Generating comp. sequences (almost) uniformly for general input is #BIS-hard.
Perspectives: FPT and Boltzmann sampling algorithms
b
c
de
f gh i
j k l m
nopq
rs
a t
uv
a b c d e
f
gh
ijk l
mn o
pqr
s t u v
a
b
c
d
e f
g
h ij
k
l m n o
p
q
r st
u
v
R1
R2
R3
i) Input Structures
ab c d e f g h i j k l mn o p q r s t u v
ii) Merged Base-Pairs
aeg
p uk
csn
j
qtb
d h m
f l o
vi
r
iii) Compatibility Graph
∅
v o
o ll ff r
l i
j qq t
t bb dd hh m
k pk p g
g n g p uu g e
u e a e ss c
iv) Tree Decomposition
RNARedPrint
Partition FunctionStochastic Backtrack
GACUUUAGCU
UGUUUGAAGU
UA
GACUUGGGAC
UUUUAGGUGU
CU
UUAGAUUCCA
GGGGCUUGU
AGG
GGUUUAGGGC
CUUUAGGUAU
CU
AUUGUGACGG
UCGUGACUAG
UU
. . .
0
0
0
−5
−5
−5
kcal.mol−1
kcal.mol−1
kcal.mol−1
E1
E2
E3GC%
0 50 100
v) Weight Optimization (Adaptive Sampling)
Weightsupd
ate
GCCGCGGUAGCUACAGCCGGCUUUGGGGUUGGGUAGACUCCGGUGCUGCAGCGGCUGUGGCUGGCCGGUUCUGGUUUGCUUAGGGCUACGACGGCGGUGCCGGCAUUUGC. . .
0kcal.mol−1
E1E2E3
GC%
0 50 100
vi) Final Designs
FPT algorithm for counting based on tree decompositionMultidimensional Boltzmann sampling to control energies, GC. . .
Counting compatible sequences: Watson-Crick + > 2 structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Bipartite→
∏cc∈CC(G)
2×#IS(cc) = 496 672
Counting compatible sequences: Watson-Crick + > 2 structures
C
U
G
A
Compatible Base Pairs = Include Wobble base pairs
a b c d e f g h i j k l m n o p q r s t u v
Dependency graph:Cycles, Paths, Trees. . .
aeg
p
u
k bd
h j q
t
f l o v
cs
i
mn
r
Question: How many Compatible sequences?
Answer: Bipartite→∏
cc∈CC(G)
2×#IS(cc) = 496 672
Perspectives: FPT and Boltzmann sampling algorithms
b
c
de
f gh i
j k l m
nopq
rs
a t
uv
a b c d e
f
gh
ijk l
mn o
pqr
s t u v
a
b
c
d
e f
g
h ij
k
l m n o
p
q
r st
u
v
R1
R2
R3
i) Input Structures
ab c d e f g h i j k l mn o p q r s t u v
ii) Merged Base-Pairs
aeg
p uk
csn
j
qtb
d h m
f l o
vi
r
iii) Compatibility Graph
∅
v o
o ll ff r
l i
j qq t
t bb dd hh m
k pk p g
g n g p uu g e
u e a e ss c
iv) Tree Decomposition
RNARedPrint
Partition FunctionStochastic Backtrack
GACUUUAGCU
UGUUUGAAGU
UA
GACUUGGGAC
UUUUAGGUGU
CU
UUAGAUUCCA
GGGGCUUGU
AGG
GGUUUAGGGC
CUUUAGGUAU
CU
AUUGUGACGG
UCGUGACUAG
UU
. . .
0
0
0
−5
−5
−5
kcal.mol−1
kcal.mol−1
kcal.mol−1
E1
E2
E3GC%
0 50 100
v) Weight Optimization (Adaptive Sampling)
Weightsupd
ate
GCCGCGGUAGCUACAGCCGGCUUUGGGGUUGGGUAGACUCCGGUGCUGCAGCGGCUGUGGCUGGCCGGUUCUGGUUUGCUUAGGGCUACGACGGCGGUGCCGGCAUUUGC. . .
0kcal.mol−1
E1E2E3
GC%
0 50 100
vi) Final Designs
FPT algorithm for counting based on tree decompositionMultidimensional Boltzmann sampling to control energies, GC. . .
o
1
2
3
4
5
6
7
8
9
1 0
E 1 0 _ 1
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
E 2 3 _ 1
E 2 3 _ 2
E 2 3 _ 5
E 2 3 _ 6
E 2 3 _ 7
E 2 3 _ 8
E 2 3 _ 9
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1 4 2 4 3
4 4
4 5
4 6
4 7
4 8
4 9
5 0
o
120
46
68
98
115
128
148
171
201
218
240
258
282
312
330
350
363
382
397
419
UUCCUAAU
UAGGAA
UA
AC
GU
GA
UA
GU
GU
UG
UA
UC
CG
U
GUGCAUACG
AU
GC
AU
AGUG
CCCU
CCACU
G
AGGG
UU
CA
C
GA
UA
UG
AG
GC
CU
CG
GU
GA
A
UA
UC
C G U
GUGUACACG
G U A C A U A G U GCCCUCCACU
G A G G G UG
UG
U
AC
GC
UG
UA
AG
GC
CU
UA
CG
AC
AC
A
UG
CG
CGU
GU
GC
AU
AC
G
AUGCAUAGUG
CCCU
CCACU
G
AGGG
UG
UG
UA
UA
UC
AU
GC
GG
GU
UA
UA
CU
AG
UA
UA
AC
CC
GU
UA
UA
CA
CA
AU
GA
GG
GA
UC
CC
CCU
C C U C G G GGAGG
UC
AUAU
AA
U
AAC
G
AUA
GUUAU
UAA
AU
C
A
UUU
UG
CG
U
UUU
UAUAAAACUU
A
CU
AU
AU U U A U
UG
A A UC
A U U U
CA
G G
AGUG
GUCAA
UU U A A A U A A
UC
G
A UA
UUUGU
U AAA
UC
A
U
UG
GA
AU
A A A
G G UG
C
UA
AA
AU
U
UUC
GAAU
UA
UACCCA
U:A (405)-413->U:A (405)-413->
G
C
G
G
A
U
UUAGCUC
AGU
U
G
GG A
G A G C
G
C
C
A
G
AC
U
GA A
G
A
P
C
U
G
G
AG G
U
C
C U G U G
U PC
G
AUC
CACAG
A
A
U
U
C
G
C
A
CC
A
1
10
20
30
40
50
60
70
76
Conclusions
RNA is cool!
RNA design is one of the current challenge of RNA bioinformatics withfar-reaching consequences for drug design, synthetic biology. . .
Practical use-cases require expressive and modular constraints
Future methods: kinetics, interactions,multiple structures,pseudoknots. . .
RNA inverse folding is the combinatorial core of design.It remains largely unsolved, and opens new lines of research in Comp.Sci.
o
1
2
3
4
5
6
7
8
9
1 0
E 1 0 _ 1
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
E 2 3 _ 1
E 2 3 _ 2
E 2 3 _ 5
E 2 3 _ 6
E 2 3 _ 7
E 2 3 _ 8
E 2 3 _ 9
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1 4 2 4 3
4 4
4 5
4 6
4 7
4 8
4 9
5 0
o
120
46
68
98
115
128
148
171
201
218
240
258
282
312
330
350
363
382
397
419
UUCCUAAU
UAGGAA
UA
AC
GU
GA
UA
GU
GU
UG
UA
UC
CG
U
GUGCAUACG
AU
GC
AU
AGUG
CCCU
CCACU
G
AGGG
UU
CA
C
GA
UA
UG
AG
GC
CU
CG
GU
GA
A
UA
UC
C G U
GUGUACACG
G U A C A U A G U GCCCUCCACU
G A G G G UG
UG
U
AC
GC
UG
UA
AG
GC
CU
UA
CG
AC
AC
A
UG
CG
CGU
GU
GC
AU
AC
G
AUGCAUAGUG
CCCU
CCACU
G
AGGG
UG
UG
UA
UA
UC
AU
GC
GG
GU
UA
UA
CU
AG
UA
UA
AC
CC
GU
UA
UA
CA
CA
AU
GA
GG
GA
UC
CC
CCU
C C U C G G GGAGG
UC
AUAU
AA
U
AAC
G
AUA
GUUAU
UAA
AU
C
A
UUU
UG
CG
U
UUU
UAUAAAACUU
A
CU
AU
AU U U A U
UG
A A UC
A U U U
CA
G G
AGUG
GUCAA
UU U A A A U A A
UC
G
A UA
UUUGU
U AAA
UC
A
U
UG
GA
AU
A A A
G G UG
C
UA
AA
AU
U
UUC
GAAU
UA
UACCCA
U:A (405)-413->U:A (405)-413->
G
C
G
G
A
U
UUAGCUC
AGU
U
G
GG A
G A G C
G
C
C
A
G
AC
U
GA A
G
A
P
C
U
G
G
AG G
U
C
C U G U G
U PC
G
AUC
CACAG
A
A
U
U
C
G
C
A
CC
A
1
10
20
30
40
50
60
70
76
Collaborators
University McGillVladimir ReinharzJérôme Waldispühl
MITBonnie BergerSrinivas DevadasAlex LevinMieszko LisCharles O’Donnell
LRI – Univ. Paris SudAlain DeniseVincent Le Gallic
Wuhan UniversityYi ZhangYu Zhou
LIGM – Marne la ValléeStéphane Vialette
LIX – Ecole PolytechniqueAlice HéliouMireille Regnier
Simon Fraser UniversityJozef HalesJan Manuch (UBC)Ladislav Stacho
Cédric ChauveJulien Courtiel
TBI ViennaRonnie LorenzAndrea Tanzer
o
1
2
3
4
5
6
7
8
9
1 0
E 1 0 _ 1
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
E 2 3 _ 1
E 2 3 _ 2
E 2 3 _ 5
E 2 3 _ 6
E 2 3 _ 7
E 2 3 _ 8
E 2 3 _ 9
2 4
2 5
2 6
2 7
2 8
2 9
3 0
3 1
3 2
3 3
3 4
3 5
3 6
3 7
3 8
3 9
4 0
4 1 4 2 4 3
4 4
4 5
4 6
4 7
4 8
4 9
5 0
o
120
46
68
98
115
128
148
171
201
218
240
258
282
312
330
350
363
382
397
419
UUCCUAAU
UAGGAA
UA
AC
GU
GA
UA
GU
GU
UG
UA
UC
CG
U
GUGCAUACG
AU
GC
AU
AGUG
CCCU
CCACU
G
AGGG
UU
CA
C
GA
UA
UG
AG
GC
CU
CG
GU
GA
A
UA
UC
C G U
GUGUACACG
G U A C A U A G U GCCCUCCACU
G A G G G UG
UG
U
AC
GC
UG
UA
AG
GC
CU
UA
CG
AC
AC
A
UG
CG
CGU
GU
GC
AU
AC
G
AUGCAUAGUG
CCCU
CCACU
G
AGGG
UG
UG
UA
UA
UC
AU
GC
GG
GU
UA
UA
CU
AG
UA
UA
AC
CC
GU
UA
UA
CA
CA
AU
GA
GG
GA
UC
CC
CCU
C C U C G G GGAGG
UC
AUAU
AA
U
AAC
G
AUA
GUUAU
UAA
AU
C
A
UUU
UG
CG
U
UUU
UAUAAAACUU
A
CU
AU
AU U U A U
UG
A A UC
A U U U
CA
G G
AGUG
GUCAA
UU U A A A U A A
UC
G
A UA
UUUGU
U AAA
UC
A
U
UG
GA
AU
A A A
G G UG
C
UA
AA
AU
U
UUC
GAAU
UA
UACCCA
U:A (405)-413->U:A (405)-413->
G
C
G
G
A
U
UUAGCUC
AGU
U
G
GG A
G A G C
G
C
C
A
G
AC
U
GA A
G
A
P
C
U
G
G
AG G
U
C
C U G U G
U PC
G
AUC
CACAG
A
A
U
U
C
G
C
A
CC
A
1
10
20
30
40
50
60
70
76
Thanks!
Poster submission & Registration open soon. . .(+ ISCB travel fellowships for students)
References I
R. Nussinov and A.B. Jacobson.
Fast algorithm for predicting the secondary structure of single-stranded RNA.Proc Natl Acad Sci U S A, 77:6903–13, 1980.