bioinformatic approaches to functionally characterise rnas

18
Research program Bioinformatic approaches to functionally characterise RNAs Paul Gardner June 2, 2011 Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Upload: paul-gardner

Post on 03-Jul-2015

1.059 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Bioinformatic approaches to functionally characterise RNAs

Research program

Bioinformatic approaches to functionallycharacterise RNAs

Paul Gardner

June 2, 2011

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 2: Bioinformatic approaches to functionally characterise RNAs

Research program

The data deluge and bioinformatics

I There is a deluge of data being generated by new techniquesin sequencing and structure determination

I Bioinformatic analysis is the only way to analyse and annotatethis level of data, now driving a lot of biological discoveries

0

50

100

150

200

250

300

The growth of Genbank

Year

Num

ber

of n

ucle

otid

es (

billi

ons)

1985 1995 2005

0

2

4

6

8

10

12

The growth of UniProt

Year

Num

ber

of p

rote

ins

(mill

ions

)

1998 2002 2006 2010

0

10

20

30

40

50

60

The growth of PDB

Year

Num

ber

of s

truc

ture

s (t

hous

ands

)

1980 1990 2000 2010

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 3: Bioinformatic approaches to functionally characterise RNAs

Research program

Bioinformatics is important

Table 1: The most cited articles for each OECD country.Citations Country Reference

32399 de Thompson et al. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment... NAR30616 us Altschul et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. NAR20099 ch Towbin et al. (1979) Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets... PNAS18479 fr Thompson et al. (1997) The CLUSTAL X windows interface... NAR17011 uk Bland & Altman (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet16451 jp Iijima (1991) Helical microtubules of graphitic carbon. Nature9773 at Kresse & Furthmuller (1996) Efficient iterative schemes for ab initio total-energy calculations... Physical Review B8593 ie, il Lander et al. (2001) Initial sequencing and analysis of the human genome. Nature8113 no Pedersen (1994) Randomised trial of cholesterol lowering in 4444 patients with coronary heart disease... Lancet7976 pt Perdew et al. (1992) Atoms, molecules, solids, and surfaces: Applications of the generalized... Physical Review B7202 it Berendsen et al. (1984) Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics6972 be Murshudov et al. (1997) Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D.6902 se Huelsenbeck & Ronquist (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics5850 pl Cornell et al. (1995) A second generation force field for the simulation of proteins, nucleic acids... J Am Chem Soc4681 fi Simons & Ikonen (1997) Functional rafts in cell membranes. Nature4588 au, es Perlmutter et al. (1999) Measurements of Ω and Λ from 42 high-redshift Supernovae. Astrophysical Journal3950 dk Nielsen et al. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of... Protein Eng.3805 nz Ihaka & Gentleman (1996) R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics3351 kr, mx Eidelman et al. (2004) Review of particle physics Physics Letters...3244 si Wilk et al. (2001) High-κ gate dielectrics: Current status and materials properties considerations. Journal of Applied Physics2953 gr Polymeropoulos et al. (1997) Mutation in the α-synuclein gene identified in families with Parkinson’s disease. Science2881 sk Miertus et al. (1981) Electrostatic interaction of a solute with a continuum... Chemical Physics2393 hu Morice et al. (2002) A randomized comparison of a sirolimus-eluting stent... New England Journal of Medicine1977 is Sever et al. (2003) Prevention of coronary and stroke events with atorvastatin... Lancet1790 ca Sedlak & Lindsay (1968) Estimation of total, protein-bound, and nonprotein sulfhydryl groups... Analytical Biochemistry1366 tr Ozgur et al. (2005) A comprehensive review of ZnO materials and devices. Journal of Applied Physics291 lu James et al. (2006) Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature

1

*This data was collected from Scopus in April 2011. NB. The US data is the union of multiple searches.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 4: Bioinformatic approaches to functionally characterise RNAs

Research program

“All science is either physics or stamp collecting” – ErnestRutherford

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 5: Bioinformatic approaches to functionally characterise RNAs

Research program

Stamp collectors

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 6: Bioinformatic approaches to functionally characterise RNAs

Research program

Rfam

I My aim is to build a periodic table for classifying RNA,enabling researchers to predict function.

0 1 5 10 50100

500

10

100

1000

10000

1e+05

1e+06

Cis-reg.

Gene

snRNA

snoRNA

Intron

Types

tRNA

splicing

thermore ulator

e el ad r

HACA-box snoRNA

scaRNA

Intron

IRES

frameshift element

sRNA

riboswitch

eantis ns

rRNA

miRNA

CRISPR

ribozyme

CD-box snoRNA

5’

3’

GSSVVYRU

RGYYY

ARYu

GG

U u AR M R C

RYYDSVY

UB H H

AM

BCHRDWRRu

YR Y R G G

U UCR

AWUCCYDY

YNBBNSYR

5’ 3’

AAU

UC

CA

G

C

G

A

G

A

GGCAGAGGGAGCGAGCGGGCG

GCCGGCUAGGGUGGA

AGAGC

CGGGC

GAGCA

GA G C UG

CGCUGCGGGCG

UCCUGGG

A AG G G A G A

U C C G GA G C GAAU

A GGGG

GC

UUCGC

C U CU

GG

CC

C

A

G C C CUC C CGC

UGAU C C C C C A G C C A

G C GG U C C G C A A

C C C U U G C CGCAUCCACGAAACUUUGCCCAUAGCAGCGGGCGGGCACUUUGCACUG

GAAC

UUACAACACCCGAGCAAG

GACG

CGAC

UCUCCCGACG

CGGGGAGGCUAU

UCU

GCCC A

UUUG

GGG

ACA

CU

UC

CC

CG

CCGC

5’

3’

UGAUG

YC

CC

UCW

CC

CA

CYCY

UGAA

G A U CCCA

GG

UGGGC

GAGGG

R A Y R GYCAG

MG

GGAUC

5’3’A

AYAAAAUAAUUUACAUUCCA AG

GACCGGUAU

UAUUGU A

GGGGAU

UUGU

GACU

UY C

AA

GGCA

AY

GUCCUCU

CUA

CAA

CCGAGUUC R A

GA

AUAARY

AC

MA

AYGGCUC U

UUUU

GUU

AUU

CGAAAG C

UUA

CAAGDU

VYR

GYRUMUU

CURUAURCU

CWCYUca

MUY

A CUUUC

MAGUACU

UCAC

AC G

GGCCWRACAKMU

5’ 3’

UVDWHAUGAUG

AG

YU

CMACUUCWUuGG

UC

CG U G U U U C U G A g a R MC

YM

RUGAUMUBWRU

Ga

SA

Aa

GUUCUGAYUHM

e

g

5’ 3’

AUAC

UUACCU

GG

MM

GRGRDSWWWSSRYG

AUCA

MG

A A GG Y B S W U B B C C Y

AR

GGYKR

GKS

HY

MKC

CAUUG

C A C UYC

GG a

VKGKG Y Y

GAM

CC

YWGMGRU Y

WM C

CC

A A AU

GYGGUK

RA

ACYC

GA

SH

KY A U A A U U

UKUGGYA

GU

GG

GG

R aMCUG

CG U

UCGC

GC

KKY

CC

CY

WS

5’ 3’

UGG

CC

SAU

UUUGGCACUAGCACAU

UU

UUGCUU

SU

GU

CU

C UCC

GCU

CUG

AGCAA

UC

AUGUGYAGUGCCAAU

AUR

GG

AM

A

5’

3’GUYYG

MGW

GS M R B

AU

CC

AY

U A M AAC

AAGG

AU

UGAA

AC

5’

3’

NY

BKKM

SW

GGUUC

SWR

R M C YUCC

CW

SK H W A A A A

AACUA

RGGRRDD

5’

3’CGCUAUCAUCAUUA

AC

UU

YAUUUAUUACC

GUCA

UU

MA

SYg a W

SW

GAAU G U C U G Y W U A

CC

CCUAUUUC

R A C C G R M U G C UUC

GCRKYCGGUUU

UU

UWW

5’

3’YCA

UCA

Y CAYCAUCAYCCUGA

CUAG

UC

U U U C R G GM g G A U GU

ac g

cR U R

CY

GG

RA

GRY

RDK

H A aRA

YCUYCCRGGggu

aa W G R Y R

MRWRA

AMRHAWUA DWR A R C C C Y C GG R A G A B

CaAW

CUYYCGRGGGYUUUUUUDU

5’

3’UGAAAGACGCGCAUUURUUAUCAUC

AU

CCC

UG

WW W

WCAG

AGAUGWWAW

UUUG G C C A C A S

HG

WBaGUGGCC

UUUUUC

5’

3’

UGUAAAA

AA

CAUY

AY U U A

GCGUSA

YU

UUCUWUCA

ACA G C U A A C

AAUUGUU

RUUAC

U G CCU

AR

YScaaU

YWU W A G G R U

AaUUUU

WM

AAAARGG

CKAU

AAAA

AAC

GA

UU

G GG

GGAUGAc

RA

MAUGRAC

GCU C

AAGCA

5’ 3’

SAWVAGU

CUGKGCU

Wg A

G C M C ACUGAYGAG Y C BY

U G ARA

URMGRCG

AAA

CUYWUS

5’

3’

BYYKRYGRY

CAUA SCR

NNDKGRWHRCAC

CBGWUCCCRU

Y C CG AWCW

C V G M AG UU A A

RC N Y B Y YW

G S G C CD R D

KU aGUA C U

DB R R U

G GG U

KACC

VYVUG

GGARuA

SYAGGWC

RYYGYMRDBY

Sequences

Families

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 7: Bioinformatic approaches to functionally characterise RNAs

Research program

Rfam: families of ncRNAs

SEED alignments

1,446 families

FULL alignments

DESC

RFAMSEQ 10.0169,604,735,232 nucs

55,655,739 seqs

WU-BLAST

cmsearch

cmalign

Infe

rnal

Curated

27,045 regions

3,192,596 regions

Genome annotation

-DAS, GFF-ENSEMBL-UCSC-ncRNA.org

Benchmarksand training

RNA Biology

40%

88% 5%

-1%

178%

http://rfam.sanger.ac.ukhttp://rfam.janelia.org

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 8: Bioinformatic approaches to functionally characterise RNAs

Research program

Why RNA?

I One of the major scientific realisations of this century is theimportance of RNAs in genetic regulation. Eg.

I Nobel prize for discovering RNAi to Andrew Fire and CraigMello

I RNA is also involved in chromosome deactivation, initiation ofDNA replication, transposon supression, environmentalsensors, ...

xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xx

xx

xx

xx

xx

xx

x

xx

1950

1960

1970

1980

1990

2000

2010

0

20

40

60

80

100

120

Non−protein−coding RNA related publications

Year

Num

ber

of p

ublic

atio

ns (

'000

s)

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 9: Bioinformatic approaches to functionally characterise RNAs

Research program

RNA and disease: progressive hearing loss

5’ 3’

0Sequence conservation

1

UGGC

CSAU

UUUGGCACUAGCACAU

UUU

UGCUU

SU

GU

CU

C UCC

GCU

CUG

AGCAA

UC

AUGUGYAGUGCCAAU

AUR

GG A

MA

Menca et al. (2009) Mutations in the seed region of human miR-96 are responsible for nonsyndromic progressive

hearing loss. Nat. Genet.

Lewis et al. (2009) An ENU-induced mutation of miR-96 associated with progressive hearing loss in mice. Nat.

Genet.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 10: Bioinformatic approaches to functionally characterise RNAs

Research program

RNA and disease: Prader-Willi syndrome

5’ 3’

0

Sequence conservation

1

GGAUCGAU

GA

UG

AC

UYC

CWYA

HA

WR

CA

UU

CC

UU

GGA

AAa

G C UGAA

CAAA

AU

GAGUG A R A A C U C Y

MU

AC

CGUCDYYCU

CR

UC

GA

ACUGAG

GUCC

Cavaill et al. (2000) Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual

genomic organization. PNAS

Skryabin et al. (2007) Deletion of the MBII-85 snoRNA gene cluster in mice results in postnatal growth

retardation. PLoS Genet.

Ding et al. (2008) SnoRNA Snord116 (Pwcr1/MBII-85) Deletion Causes Growth Deficiency and Hyperphagia in

Mice. PLoS ONE

Sahoo et al. (2008) Prader-Willi phenotype caused by paternal deficiency for the HBII-85 C/D box small nucleolar

RNA cluster. Nat Genet

de Smith et al. (2009) A Deletion of the HBII-85 Class of Small Nucleolar RNAs (snoRNAs) is Associated with

Hyperphagia, Obesity and Hypogonadism. Hum. Mol. Genet.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 11: Bioinformatic approaches to functionally characterise RNAs

Research program

RNA important in agriculture

I The Texel sheep, myostatin and miR-1

5’ 3’

0Sequence conservation

1

BCB

YRR

G SBMCAURCUUCYUUAYRU

SCCCAUAB

KRAC

H U V VRMW

SCU

AUGGA

AUGUAARGAAGURUGKRK Y

YYH

GGB

Clop et al. (2006) A mutation creating a potential illegitimate microRNA target site in the myostatin gene affects

muscularity in sheep. Nat. Genet.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 12: Bioinformatic approaches to functionally characterise RNAs

Research program

ncRNAs and human health

I Genetic diseasesI RNase MRP variation and

cartilage-hair hypoplasia

I Mitochondrial tRNA variation: Leigh

syndrome, MELAS syndrome, MERRF

syndrome, cardiomyopathy,

ophthalmoplegia, ...

I CancerI Y RNA

I Telomerase RNA

I microRNAs

I Alzheimer’s diseaseI BACE1-AS, 38A

I Viral infectionI Human miRNAs required for infection

(eg. miR-122 and HCV)

I Viral miRNAs required for infection

(eg. HIV TAR miRNA)

I Many structured regulatory elements

(eg. IRE, IRES)

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 13: Bioinformatic approaches to functionally characterise RNAs

Research program

RMfam: a motif alignment library

I Tetra loops

I T-loop

I K-turns

I Intrinsic terminators

I Group II intron domainV/U6 stem-loop

I Shine-Dalgarno

I Sm binding site

I · · ·

GR

A

CU

U CG

G

R

UR

R

GAGY

RR

RC R

RGA

R

GCCGAAG

G

R

Y

GAGGY

5´RAAAARCY

Y R

RGYUUUUU

U U5´RRR U

UU

U U U5´

GRRR

R

Y R

Y

YYY

Y U U U A5´

R

AA

YA

R

5´ A R R

R

Y

Y Y Y Y Y Y Y U U Y5´

A

R

R

AY

Y

YYY

U Y Y U U U U U5´R C G Y U G A U A C G C R

GYUGU

R

Y

U

U

G

YR

YCGCRC

C G

GYGCGR

YR

C

R

R

A

Y

ACARY

ARR

U Y

YYU

5´ R A A

R Y

Y5´

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 14: Bioinformatic approaches to functionally characterise RNAs

Research program

Bacterial intrinsic terminators

RRR U

UU

U U U5´

Native genomes

% R

IT g

enes

0

10

20

30

40

B.s

ubtil

is

B.th

etai

tmcr

n

E.c

oli

D.r

adio

dura

ns

S.e

nter

ica

C.d

iffic

ile

U.p

arvu

m

F.nu

clea

tum

C.p

neum

onia

e

S.g

riseu

s

L.in

terr

ogan

s

F.no

dosu

m

P.m

arin

us

T.ye

llow

ston

i

M.in

fern

orum

H.p

ylor

i

M.tu

berc

ulos

s

RNIE (genome) RNIE (gene) TransTermHP

Permuted genomes

% R

IT g

enes

B.s

ubtil

is

B.th

etai

otao

mic

ron

E.c

oli

D.r

adio

dura

ns

S.e

nter

ica

C.d

iffic

ile

U.p

arvu

m

F.nu

clea

tum

C.p

neum

onia

e

S.g

riseu

s

L.in

terr

ogan

s

F.no

dosu

m

P.m

arin

us

T.ye

llow

ston

ii

M.in

fern

orum

H.p

ylor

i

M.tu

berc

ulos

is

0

2

4

Gardner et al. (2011) RNIE: genome-wide prediction of bacterial intrinsic terminators. NAR.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 15: Bioinformatic approaches to functionally characterise RNAs

Research program

TRIT: A Mycobacterium specific terminator motif

−20 −10 0 10 20 30

0

5

10

15

20

25

TRIT proximity to genic features

Distance to nearest gene terminus (nucs)

Fre

quen

cy

G C G A G C A G A C G C A R A A C R C C C R R

Y

R

R

Y G G G G U U Y U G C G U C U G C U C G C

5'

WebLogo 3.0

0.0

1.0

2.0

bits

C

UCGGCA

CG

5

U

CC

A

GAAGAUC10

AGGUACAG15U

CU

GAUGAC

AGA20

GUU

G

A

CU

ACGCAC

25A

UCU

C

AGU

G

CAG

C

UAC

A

U

30

G

C

U

A

U

GCU

A

CGU

A

CG

WebLogo 3.0

0.0

1.0

2.0

bits

U

A

G

C

35U

A

GCU

A

CGU

G

AU

A

C

G

AU

40C

AGUA

G

UCAGUGG

45

UGCC

A

U

GCACUG

U50

ACUA

C

UA

GUCG55

C

A

UCC

UU

A

GU

C60

UU

G

CAGU

GCGC

65AGCG

1e−05

5e−05

1e−04

5e−04

0.00

1

0.00

5

0.01

Distribution of P−values

Fre

quen

cy

0

10

20

30

40

50

60

70

P-value (MFE.native vs MFE.shuffled)

A B C

D E

Gardner et al. (2011) RNIE: genome-wide prediction of bacterial intrinsic terminators. NAR.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 16: Bioinformatic approaches to functionally characterise RNAs

Research program

RNA derived pseudogenes

I There are millions of RNA-derived pseudogenes in the humangenome, it is very difficult to discriminate these fromfunctional copies

I We have derived a metric that discriminates between RNApseudogenes and functional RNAs

−80

−70

−60

−50

−40

−30

−20

−10 0 10 20 30 40 50 60 70 80 90 10

011

012

0

SRP

CM−HMM (bits)

Fre

q.

0

20

40

60

80

100

120

140 trueCmemittrueEmblDepseudoEmblDepseudoHmmemitpseudoRepeatMasker

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 17: Bioinformatic approaches to functionally characterise RNAs

Research program

New sequencing data: RNA-seq, transposon libraries,comparative genomics, ...

05101520253035

RNAseq

reads

020406080100

TraDIS

reads

994000 994100 994200 994300 994400

genome coordinate

dsrA

putative transcript yodD

Perkins et al. (2009) A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonellatyphi. PLoS Genet.Langridge et al. (2009) Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants.Genome Res.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs

Page 18: Bioinformatic approaches to functionally characterise RNAs

Research program

Thanks!

PPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the RoyalSociety of New Zealand.

Paul Gardner Bioinformatic approaches to functionally characterise RNAs