rna abstract shape analysis - · pdf fileshape space rnashapes simple shape analysis complete...

55
Advanced Course: Shapes Robert Giegerich Motivation Lost in Folding Space Abstraction comes to rescue Abstract shapes Defining shape abstractions Properties of the shape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites Application: Shape based indexing Shape based RNA Abstract Shape Analysis Robert Giegerich Faculty of Technology & Center of Biotechnology Bielefeld University [email protected] EMBO Practical Course on Computational RNA Biology, Cargese, April 2010 Robert Giegerich Advanced Course: Shapes

Upload: trankhuong

Post on 06-Feb-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

RNA Abstract Shape Analysis

Robert Giegerich

Faculty of Technology & Center of BiotechnologyBielefeld University

[email protected]

EMBO Practical Course on Computational RNA Biology,Cargese, April 2010

Robert Giegerich Advanced Course: Shapes

Page 2: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Where do we stand ...

1 Thermodynamic model (X. Flamm)

2 MFE folding, optimal structure, fallacies (G. Steger)

3 representative structural alternatives

4 structure prediction from multiple sequences (D.Mathews)

5 structure comparison (D. Mathews)

6 search by structure (I. Meyer, P. Gardner)

7 . . .

Robert Giegerich Advanced Course: Shapes

Page 3: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

1 MotivationLost in Folding SpaceAbstraction comes to rescue

2 Abstract shapesDefining shape abstractionsProperties of the shape space

3 RNAshapesSimple shape analysisComplete probabilistic shape analysisShape Probabilitites

4 Application: Shape based indexing

5 Application: Shape based matching

Robert Giegerich Advanced Course: Shapes

Page 4: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Better than optimal . . . (1)

Can we get better/more information from thermodynamicfolding than the MFE structure?

How accurate is the MFE structure anyway?

Robert Giegerich Advanced Course: Shapes

Page 5: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Better than optimal . . . (1)

Can we get better/more information from thermodynamicfolding than the MFE structure?

How accurate is the MFE structure anyway?

Robert Giegerich Advanced Course: Shapes

Page 6: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

2004 Mfold evaluation by Gutell Lab

Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR.: Evaluation of the

suitability of free-energy minimization using nearest-neighbor energy

parameters for RNA secondary structure prediction. BMC Bioinformatics.

2004 Aug 5;5:105.

Compares MFE foldings to structures derived by comparativeanalysis and proven by experimental techniques.Findings:

base pair accuracy of about 20% - 71%

no improvement from recently updated thermodynamicparameters

note: did not check for good near-optimal solutions

Robert Giegerich Advanced Course: Shapes

Page 7: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Base pair accuracy – what does it mean?

( )

( )

((((

((((

))))

))))

(((( ))))

( )((((

(((())))

(((( ))))

( )....

....

....

.... ....

.... ....

....

....

((((...)))) ((((...))))...((((...))))...((((...))))...((((....))))

((((...)))) ..............((((((((((((((((........))))))))))))))))

((((...)))) ............((((((((((((((((........))))))))))))))))..

4 out of 20 BP correct...

....))))))))

.... ))))(( ))

....

....

4 out of 20 BP correct...

a reference structure

and two structures

at the same distance 16

two structures at distance 16, but with the same "shape"

(((((( ))

((((

[ [ ] [ ] ]

Robert Giegerich Advanced Course: Shapes

Page 8: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Accuracy of MFE folding . . .

RNA folding struggles with

adequacy of thermodynamic parameters . . . ?

uncovered structural motifs – pseudoknots, kissinghairpins!

dynamics of interaction with other molecules . . . ?

RNA transcript processing . . . ?

folding kinetics (co-transcriptional folding) . . . ?

...

physical properties of the folding space . . . !

Robert Giegerich Advanced Course: Shapes

Page 9: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

The problem to be solved

We want more comprehensive information about an RNAmolecule’s foldings than just its MFE structure.

Robert Giegerich Advanced Course: Shapes

Page 10: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Lost in folding space (1)

The folding space of a given sequence is LARGE:

number of foldings is exponential in sequence length

number of near-optimal foldings is exponential in energywindow

Structure asymptotics:

S(n) ≈ 1.104366 ∗ n−3/2 ∗ 2.618034n

Number of secondary structures for ALL sequences of length n.A tyical tRNA of 74 nt has about 4 Mio. feasible structures.Consider the 111 “best” structures, each with 27 - 28 bp:

Robert Giegerich Advanced Course: Shapes

Page 11: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

gggcccauagcucagugguagagugccuccuuugcaaggaggaugcccuggguucgaaucccagugggucca

((((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))).

((((((((.((...)).))((.((((((((((...))))))).))).))))))))((.(((....)))))..

((((((((.((...)).))((.((((((((((...))))))).))).))))))))((..(((...)))))..

((.(((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))..

.(((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))..

((((((((.......((((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((((((((((((((.(((...((.(((((((...))))))))))))))))))).........)))))))).

((((((((.((...))(((...((.(((((((...)))))))))))).(((((....)).))))))))))).

((((((((.((...))(((...((.(((((((...)))))))))))).(((((....))).)))))))))).

((((((((.((...))((....((((((((((...))))))).)))))(((((....)).))))))))))).

((((((((.((...))((....((((((((((...))))))).)))))(((((....))).)))))))))).

((((((((.((...))(((.((((((((((((...))))))).))((...)))))..)))...)))))))).

((((((((.((...))(((.((((.(((((((...)))))))))(((...)))))..)))...)))))))).

((((((((.((...))(((((.((((((((((...))))))).))).))).((....))))..)))))))).

((((((((.((...))((.((.((((((((((...))))))).))).)).(((....))))).)))))))).

((((((((.((...))(((((.((((((((((...))))))).))).)))((......)))).)))))))).

((((((((.((...))(((((.((((((((((...))))))).))).))).((....)).)).)))))))).

((((((((.((...))(((((.((.(((((((...)))))))))...)))(((....))))).)))))))).

((((((((.((...))(((((..(((((((((...))))))).))..)))(((....))))).)))))))).

((((((((.((...))(((((.(((..((((((....))))))))).)))(((....))))).)))))))).

Robert Giegerich Advanced Course: Shapes

Page 12: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

((((((((.((...))(((((.(((..((((((...)).))))))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((..((((((...))).)))))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((((.....)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((.((....)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((..((...)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((.((...)).)))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((((((.(((....)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((((((..(((...)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((((((.(((...))).))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((.((...)))).)).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((.((((...)))).)).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((.((((((...))))))..))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((.((((((...)))).)).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((.(((((((...)))))))..)).)))(((....))))).)))))))).

((((((((.((...))((((..((((((((((...))))))).)))..))(((....))))).)))))))).

((((((((.(((....)))((.((((((((((...))))))).))).))((((....))))..)))))))).

((((((((.(((....)))((.((((((((((...))))))).))).))((((....)).)).)))))))).

((((((((.((((.((.((...))))((((((...)))))))).))..(((((....)).))))))))))).

Robert Giegerich Advanced Course: Shapes

Page 13: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

((((((((.((((.((.((...))))((((((...)))))))).))..(((((....))).)))))))))).

(((((((...((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((((((...((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

(((((((...((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

(((((((...(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

(((((((..((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))).

(((((((..((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).

(((((((..((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).

(((((((..((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).

(((((((((.((....))))..((((((((((...))))))).))).((((((....)).))))))))))).

(((((((((.((....))))..((((((((((...))))))).))).((((((....))).)))))))))).

(((((((((.((....))))..((((((((((...))))))).))).((((((....)))).))))))))).

(((((((((..((...))))..((((((((((...))))))).))).((((((....)).))))))))))).

(((((((((..((...))))..((((((((((...))))))).))).((((((....))).)))))))))).

(((((((((..((...))))..((((((((((...))))))).))).((((((....)))).))))))))).

(((((((((((...))..))..((((((((((...))))))).))).((((((....)).))))))))))).

(((((((((((...))..))..((((((((((...))))))).))).((((((....))).)))))))))).

(((((((((((...))..))..((((((((((...))))))).))).((((((....)))).))))))))).

Robert Giegerich Advanced Course: Shapes

Page 14: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

(((((((((..((.((.((...))))((((((...))))))))))..((((((....)).))))))))))).

(((((((((..((.((.((...))))((((((...))))))))))..((((((....))).)))))))))).

(((((((((..((.((.((...))))((((((...))))))))))..((((((....)))).))))))))).

(((((((((((...))((((...))))((((((....))))))))..((((((....)).))))))))))).

(((((((((((...))((((...))))((((((....))))))))..((((((....))).)))))))))).

(((((((((((...))((((...))))((((((....))))))))..((((((....)))).))))))))).

(((((((((((...))((((...))))((((((...)).))))))..((((((....)).))))))))))).

(((((((((((...))((((...))))((((((...)).))))))..((((((....))).)))))))))).

(((((((((((...))((((...))))((((((...)).))))))..((((((....)))).))))))))).

(((((((((((...))((((...))))((((((...))).)))))..((((((....)).))))))))))).

(((((((((((...))((((...))))((((((...))).)))))..((((((....))).)))))))))).

(((((((((((...))((((...))))((((((...))).)))))..((((((....)))).))))))))).

(((((((..((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).

(((((((..((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).

(((((((..((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).

((((((...(((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

((((((...(((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

((((((...(((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

((((((...((((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

Robert Giegerich Advanced Course: Shapes

Page 15: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

((((((...((....((((((.((((((((((...))))))).))).)))(((....)))))))))))))). [ [][]]

(((((((((((((((.(((...((.(((((((...)))))))))))))))))))...))......)))))). [ ]

((((((..(((((((.(((...((.(((((((...)))))))))))))))))))...((....)))))))).

(((((..((.((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((((..((.((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

(((((..((.((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

(((((..((.(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

(((((..((((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))). [[][][]]

(((((..((((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).

(((((..((((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).

(((((..((((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).

(((((..((((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).

(((((..((((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).

(((((..((((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).

(((((.((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)).))))).

((((..(((((((((.(((...((.(((((((...)))))))))))))))))))...))((....)))))).

((((..(((((((((.(((...((.(((((((...)))))))))))))))))))...)).((...)))))).

((((.(((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))).)))).

(((.((.((.((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((.((.((.((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

Robert Giegerich Advanced Course: Shapes

Page 16: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

(((.((.((.((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

(((.((.((.(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

(((.((.((((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))).

(((.((.((((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).

(((.((.((((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).

(((.((.((((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).

(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).

(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).

(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).

(((.((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)).))))).

(((((.((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))).))).

((((((((.((...)).))((.((((((((((...))))))).))).)))))(((..((....)))))))).

(((...(((((((((.(((...((.(((((((...)))))))))))))))))))...))(((...)))))).

(((.((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))).))).

((.(((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))))).)).

.(((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))))).)).

Robert Giegerich Advanced Course: Shapes

Page 17: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Lost in folding space (2)

What we observe from the simple tRNA example:

LARGE number of close-to-optimal foldings

FEW structural classes holding many similar foldings

Can we condense the folding space to good representatives ofthese classes?

Robert Giegerich Advanced Course: Shapes

Page 18: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Better than optimal . . . (2)

Alternatives to a single MFE structure prediction:

BP probabilities and dotplots (McCaskill)

sampling of near-optimal structures (Mfold)

complete enumeration within a threshold (RNAsubopt)

stochastic sampling and clustering a posteriori (Sfold)

classified folding by abstract shape (RNAshapes)

Robert Giegerich Advanced Course: Shapes

Page 19: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Classification by abstract shape

C

U

GC

A

G

UA

G

G

U U GG

UC C

G

CG

C

G

U C

UG

CUG

CGG

U

GC

C G

G

A

AU

C

G

U

C

G

G

U

U

G

G

Multiple Loop

Stacking Region

Hairpin Loop

Internal Loop

Bulge Loop (left)

Bulge Loop (right)

C

C A

C

UGGC

GCC

G

CG

G

GC

C

G

A

CG

UC

G A

CU

A G

G CC

G

C

U

C

GGA

A

A

C

G

G

G

G

U

A

C

C

G

C

G

UU

C

CC

A

C

U

A

G

G

C

G

C

C

GG

What is a shape LIKE this .............. or NOT like this.....?Robert Giegerich Advanced Course: Shapes

Page 20: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Levels of abstraction

Level 0 Level 1

All types ofFull structure

loops

Level 3

All helix

Level 4

Multi− and

internal loops,

no bulges

Level 5

Stem

arrangement

only

interruptions

Robert Giegerich Advanced Course: Shapes

Page 21: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

String representation of shapes

CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG((((((...(((..(((...))))))...(((..((.....))..)))))))))..

Shape Type 5: [[][]]Shape Type 4: [[][[]]]Shape Type 3: [[[]][[]]]Shape Type 2: [[ []][ [] ]]Shape Type 1: [ [ [ ]] [ [ ] ]]

1

10

20

30

40

50

56

C

G

U

C

U

UAA

A

CUC

AU

CACC

G

U G U G G A G

C

UG C

G

A

C

CC

U

U

C C

C

UA

G

A

UU

C

G

A

A

G

A

C

G AG*

*

*

*

*

*

******

*

*

*

*

*

1

Robert Giegerich Advanced Course: Shapes

Page 22: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stems

Shape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions

RNAshapes provides shape abstraction levels 1 through 5

Robert Giegerich Advanced Course: Shapes

Page 23: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )

Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions

RNAshapes provides shape abstraction levels 1 through 5

Robert Giegerich Advanced Course: Shapes

Page 24: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions

RNAshapes provides shape abstraction levels 1 through 5

Robert Giegerich Advanced Course: Shapes

Page 25: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape abstraction mathematics

General:

tree-like domains of structures F and shapes Ptree homomorphism π : F → P

For each sequence s:

folding space of sequence s: F (s)

shape space of sequence s: P(s) = π(F (s))

shape class of p in F (s):f (x , p) = {x |x ∈ F (S), π(x) = p}

shape representative structure:shrep = class member of minimal free energy, formally

shrep(s, p)

Robert Giegerich Advanced Course: Shapes

Page 26: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Structures and shapes as trees and strings

Level 0

sr

sr

ml

c

c

c

a a u

sr

bl

aua

g

g

g

sr

sr

g

ML

HE HE

HE

HE

ML

HE

HEc

c

g

g

c

((((.(((....)))((...(...))))))) [ [ ] [ ] ]

sr

uuuu

c g

hl

g

gc

chl

ccc

Level 3

abstract

shape

Level 5

abstract

shape

sr

[ [ ] [ [ ] ] ]

[ _ [_] [ _ [_] ] ] level 1

HE

Robert Giegerich Advanced Course: Shapes

Page 27: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape algorithmics

Implementation of shape analysis:

shape abstractions are tree homomorphisms

integrate well with DP algorithms

allows for a priori rather than a posteriori analysis

compute shapes in parallel with energyperform analyses on per-shape basis

Any RNA folding program can implement shape abstractionCurrently: use RNAshapes.

Robert Giegerich Advanced Course: Shapes

Page 28: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Properties of shapes and shreps

Good properties:

shape classes are disjoint

shreps are interesting representatives

shapes have sequence-independent representation

shapes are meaningful across different sequences (ofdifferent length)

shapes and shreps can be computed efficiently

Bad properties:

shapes are too abstract

shapes are not abstract enough

Robert Giegerich Advanced Course: Shapes

Page 29: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Properties of shapes and shreps

Good properties:

shape classes are disjoint

shreps are interesting representatives

shapes have sequence-independent representation

shapes are meaningful across different sequences (ofdifferent length)

shapes and shreps can be computed efficiently

Bad properties:

shapes are too abstract

shapes are not abstract enough

Robert Giegerich Advanced Course: Shapes

Page 30: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Simple shape analysis with RNAshapes

The three top shreps of our tRNA example:

Shape GGGCCCAUAGCUCAGUGGUAGAGUGCCUCCUUUGCAAGGAGGAUGCCCUGGGUUCGAAUCCCAGUGGGUCCA[] (((((((((((((((.((((.....(((((((...))))))).))))))))))).........)))))))). -35.9 kcal/mol[[][]] ((((((((.....((.((((.....(((((((...))))))).))))))(((.......))).)))))))). -32.2 kcal/mol[[][][]] ((((((...((((.......)))).(((((((...))))))).....(((((.......))))).)))))). -31.7 kcal/mol

Robert Giegerich Advanced Course: Shapes

Page 31: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape [ ]

GG

GG

AUG

UA

GC

UCA

GUG

GUAG

AGC

GC

AU

GC

UU C

GCAUGU A U

GA

GGCC C

CGGGUU C

GAUCCCC G

GC

AUCU

C

Robert Giegerich Advanced Course: Shapes

Page 32: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape [[ ][ ]]

GGGCCCAUAG

CUCA

GUGG

UAGAG

UGCCUCCUU

UG C

AAGGAGG

AUGCCCU

G G GU U

CG

AAUCCC

AGUGGGUCCA

Robert Giegerich Advanced Course: Shapes

Page 33: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape [[ ][ ][ ]]

GGGCCCAUA

GCUCAGU

GG

U AG A G U

GCCUCCUU

UG C

AAGGAGGAUGC

CC U G G G

U UCG

AAUCCCAG

UGGGUCCA

Robert Giegerich Advanced Course: Shapes

Page 34: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape Space Statistics

Condensation of the folding space:Structure asymptotics:

S(n) ≈ 1.104366 ∗ n−3/2 ∗ 2.618034n

Level-k shape asymptotics:

P1(n) ≈ 0.98542 ∗ n−3/2 ∗ 2.40591n

P5(n) ≈ 2.44251 ∗ n−3/2 ∗ 1.32218n

Empirically, numbers are much smaller for a concrete sequenceSee some statistics within 5% kcal/mol of MFE:

Robert Giegerich Advanced Course: Shapes

Page 35: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Numbers of shapes versus structures

0

50

100

150

200

250

300

350

400

0 50 100 150 200 250 300

Nr.

of S

truct

ures

/Sha

pes

Sequence length [nt]

ShapesStructures

Robert Giegerich Advanced Course: Shapes

Page 36: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shapes versus structures, logarithmic scale

0.01

1

100

10000

1e+06

1e+08

1e+10

1e+12

1e+14

1e+16

1e+18

0 20 40 60 80 100 120

Nr.

of S

truct

ures

/Sha

pes

Sequence length N [nt]

StructuresShapes

0.0391 * 1.3968912N

0.2064 * 1.1067094N

Robert Giegerich Advanced Course: Shapes

Page 37: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Homogenity in shape classes

The “Boltzman Ensemble” on Ice

Robert Giegerich Advanced Course: Shapes

Page 38: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Homogenity in shape classes

The “Boltzman Ensemble” on Ice

Robert Giegerich Advanced Course: Shapes

Page 39: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Best k shreps

Björn Voß

[] [[][]] [[][][]]

RNAshapes

Robert Giegerich Advanced Course: Shapes

Page 40: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Complete probabilistic shape analysis

“How much would you trust a structure with aprobability of 0.1 ∗ 10−12, even when it is optimal?”

Chip Lawrence, Benasque 2003 and ISMB 2007

Robert Giegerich Advanced Course: Shapes

Page 41: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

From energy to probability

According to Boltzmann statistics, sequence s has structure xwith probability

Prob(x) = (e−Ex/RT )/Q

where Ex is folding energy, T is temperature, R universal gasconstant, and Q the “partition function”,

Q =∑

x∈F (s)

e−Ex/RT

Accumulated shape probabilities

Prob(p) =∑

π(x)=p Prob(x) for all p ∈ P(s)

Robert Giegerich Advanced Course: Shapes

Page 42: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

New information from shape probabilities

Overtaking: Shape probabilities may contradict energy ranking

[ ]E= -22.90 kcal/mol

P= 0.2370279

[ ][ ][ ]E= -22.50 kcal/mol

P= 0.0999191

[ ][ ]E= -22.30 kcal/mol

P= 0.5511424

Gets 2nd Gets 3rd

Gets 1stBjörn Voß

Robert Giegerich Advanced Course: Shapes

Page 43: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

A propos “complete”

Probabilistic shape analysis is computationally expensive

probabilities give full information about folding space, but

we can not compute only the k most likely shapes

computation feasible up to 400 nts ...

but check for RapidShapes by Stefan Janssen

Robert Giegerich Advanced Course: Shapes

Page 44: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Requirements

Complete probabilistic shape analysis

requires a non-ambiguous grammar with correct dangles atall places

applies “classified” dynamic programming

takes time O(1.1n ∗ n3) where n = |s|

Robert Giegerich Advanced Course: Shapes

Page 45: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Results from complete probabilistic analysis

Some observations:

Sequence Shape 1 Prob. Shape 2 Prob.lin-4 precursor [] 0.99999994tRNA-ala [] 0.989744 [[]] 0.008994typical mRNA [][[][]] 0.432154 [[[][]][]] 0.149831HIV-1 Leader [][[][[][]]]] 0.6164 [][[[][[][]]][]] 0.3492

Robert Giegerich Advanced Course: Shapes

Page 46: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

The RNAshapes package

Modes of operation:

Computation of low-energy shape representative structures

Computation of accumulated shape probabilities

Computation of consensus shapes

No heuristics involvedAvailable athttp://bibiserv.techfak.uni-bielefeld.de/RNAshapes/

Robert Giegerich Advanced Course: Shapes

Page 47: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Application: shape based indexing

Assume we have a ncRNA candidate in some novel organism,(⇒ lecture by C. Sharma)and want to know whether it resembles something known:

main resource: Rfam database with 600 structural RNAfamilies

families represented by curated structural alignments (cf.Rfam lecture)

search via covariance models (cf. probabilistic modelslecture)

search effort O(n4) per model

Robert Giegerich Advanced Course: Shapes

Page 48: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Filter techniques

Filter techniques are used to skip unsuccessful searches

1 BLAST filter

2 Ravenna HMM filter

3 shape index based filtering – RNAsifter by Stefan Janssen

Details on (1) and (2) in the Rfam Database lecture

Robert Giegerich Advanced Course: Shapes

Page 49: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape index construction

Robert Giegerich Advanced Course: Shapes

Page 50: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape index based search

_[_[_[_[_[]]_[_[]_]_]_]_]_[]__[_[_[_[_[]]_[_[]_]_]_]_]_[]_[_[_[_[_[]]_[_[]_]]_]_]_[]_

[_[_[[_[]][_[]_]]_]_][][[_[_[[_[]][_[]_]]_]_][]][_[]_][_[_[[_[]_][]]_]_]

[[[[[]][[]]]]][][[[[[[]][[]]]]][]][[]][[[[[]][]]]]

[][[[[]]]]

53,116 more shapes

[[[]][[[]]]]

[[[[]]]][[[]]]

[[[[[]][[]]]]][]

59,337 more shapes

[[[[[[]][[]]]]][]]

[[]][[[[[]][]]]]

[_[_[_[]_]_]_][_[_[]_]]

93,840 more shapes

[[_[_[[_[]][_[]_]]_]_][]]

[_[]_][_[_[[_[]_][]]_]_]

_[_[_[]]]_

112,489 more shapes

[[[_[_[]_]_]_]_]_

_[_[[_[[]_]_]_]]_

>Q

uery

: hg1

7_ct

_RN

Azs

et19

0_s5

031

[]

12,156 more shapes

[[][[][]]]

[[][]][][]

Robert Giegerich Advanced Course: Shapes

Page 51: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Filtered search performance

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.40 0.50 0.60 0.70 0.80 0.90 1.00

k-best-shape-index1-SS_cons-shape-index1-consensus-shape-index1-hybrid-shape-index1-union-shape-index1-RNAalifold-shape-index

cmsearch --hmmfilterk-RNAlishapes-shape-index

Robert Giegerich Advanced Course: Shapes

Page 52: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Average run times

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600 700 800 900

RNAsiftercmsearch

HMM-filterBLAST-filter

Robert Giegerich Advanced Course: Shapes

Page 53: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape based matching

Search by structure ...

Assume you have a (single) transcript with a well-definedstructure

How to search for structural homologues in relatedorganisms?

Create a specialized folding program via Locomotif athttp://bibiserv.cebitec.uni-bielefeld.de/locomotif

Robert Giegerich Advanced Course: Shapes

Page 54: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

References on abstract shape analysis

Abstract Shapes of RNA. Giegerich R, Voss B, Rehmsmeier M.Nucleic Acids Research 2004, Vol. 32, No 15, 1 - 9.

Complete Probabilistic Analysis of RNA Abstract Shapes. Voss,Giegerich, Rehmsmeier. BMC Biology, 2006, Feb 15;4(1):5

RNAshapes: an integrated RNA analysis package based onabstract shapes. Steffen P, Voss B, Rehmsmeier M, Reeder J,Giegerich R. Bioinformatics 2006, Feb 15;22(4):500-3.

Shape based indexing for faster search of RNA family databases.Janssen S, Reeder J, Giegerich R, BMC Bioinformatics, 2008

Locomotif

Rapidshapes

Robert Giegerich Advanced Course: Shapes

Page 55: RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete probabilistic shape analysis Shape Probabilitites ... Shape abstractionretainsnesting

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

The End

Thanks for your attention.

Robert Giegerich Advanced Course: Shapes