preditor predictive rna editor for plant mitochondrial genes jeff mower

PREditorPredictive RNA Editor for Plant Mitochondrial Genes

Jeff Mower

What is RNA Editing?– A process that alters the RNA sequence

– Nt insertion, deletion, or conversion

– Does not include RNA maturation processes

RNA Editing in Plants– Occurs in mitochondria and chloroplasts

– C to U and U to C conversions

– Mechanism is not known

RNA Editing in Plants– In seed plants (conifers, flowering plants, etc.)

• Widespread in mitochondrion• Rare in chloroplast• Predominantly C to U

– In non-seed plants (mosses, ferns, etc.)• Frequent in mitochondrion and chloroplast• Both C to U and U to C are common

AUG UGAAGACGGUC CAAAAUCGU UCU UGCGGCGUA

M R N S V G C Q *

UGGC


M R N S V G C Q *

UGGC

AUGAUG CAAAAUCGU UCU UGCGGCGUA

M V M R N S V G C Q *

GUC

Creation of new start codon

AG UGA UGGC


M R N S V G C Q *

UGGC

AUGAUG CAAAAUUGU UUU UGCGGCGUA

M V M C N F V G C Q *

GUC

Alteration ofprotein sequence


AG UGA UGGC


M R N S V G C Q *

UGGC

AUGAUG CAAAAUUGU UUU UGCGGUGUA

M V M C N F V G C Q *

GUC



AG UGA UGGC

No effect onprotein sequence


M R N S V G C Q *

UGGC

AUG UGAUGGCAUG UAAAAUUGU UUU UGCGGUGUA

M V M C N F V G C *

GUC


Creation of new stop codon


AG

No effect onprotein sequence

Identifying Edit Sites1. Determine experimentally

• Need to isolate and reverse transcribe RNA• Need multiple reads (editing is not always complete)



2. Predict based on sequence context• Upstream and downstream regions are important• Unambiguous motifs have not been identified



2. Predict based on sequence context• Upstream and downstream regions are important• Unambiguous motifs have not been identified

3. Predict based on protein conservation• Proteins are more conserved after editing • Editing tends to “correct” amino acid sequences

PREditor Methodology – Aligned sequence database (ASD) Construction

• 363 DNA sequences RNA editing information is known Organisms do not perform RNA editing

• Proteins were translated using the editing information

• Homologous proteins were aligned 42 different alignments All known mt proteins are covered

PREditor Methodology – Input Sequence Manipulation

• Accept a protein-coding DNA sequence as input• Translate input sequence• Align translation to homologous proteins in ASD

PREditor Methodology – The Underlying Principle

• ASD sequences translated from edited RNA• Input sequence translated from unedited DNA• “Where can RNA editing in the input sequence increase

conservation to the ASD sequences?”

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G

Input E (GAA) T (ACG) P (CCU) G (GGC)

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G


Unedited E (GAA) 0.33

Edited N/A N/A

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G


Unedited E (GAA) 0.33 T (ACG) 0.33

Edited N/A N/A M (AUG) 0.67

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G


Unedited E (GAA) 0.33 T (ACG) 0.33 P (CCU) 0.0

Edited N/A N/A M (AUG) 0.67 S (UCU) 0.0

L (CUU) 1.0

F (UUU) 0.0

ASD1 A T L G

ASD2 E M L G

ASD3 A M L G


Unedited E (GAA) 0.33 T (ACG) 0.33 P (CCU) 0.0 G (GGC) 1.0

Edited N/A N/A M (AUG) 0.67 S (UCU) 0.0 G (GGU) 1.0

L (CUU) 1.0

F (UUU) 0.0

Performance Analysis – Remove one protein sequence from the database

– Use the unedited DNA sequence as input

– Calculate statistics • Accuracy = (TP + TN) / (TP + FP + TN + FN)• Sensitivity = TP / (TP + FN)• Specificity = TN / (TN + FP)

– Repeat for each sequence

Performance Analysis– Total # C’s = 58,982

– True edited sites = 3,548 (6.0%)• TP = 2,922• FN = 626

– True non-edited sites = 55,434• TN = 54,829• FP = 605

Performance Analysis – Sensitivity = 82.4%

• Proportion of true edited sites that were predicted correctly• Increases to 94.6% if you ignore missed silent edited sites

– Specificity = 98.9%• Proportion of true non-edited sites that were predicted correctly

– Accuracy = 97.9%• Proportion of all sites that were predicted correctly• Increases to 98.7% if you ignore missed silent edited sites

Limitations– Cannot predict editing at silent sites

• 458 of 626 FN are at silent sites

Limitations– Cannot predict editing at silent sites

• 458 of 626 FN are at silent sites

– Not a major problem in practice• Silent editing sites do not affect the protein sequence • Many silent sites are only occasionally edited• Only 13% of editing sites are silent (expect ~38%)

Limitations– Aligned sequence database is skewed

Origin of

RNA editing

Angiosperms 266 (74%)

Gymnosperms 2 (1%)

Ferns 0

Horsetails 0

Hornworts 0

Mosses 0

Liverworts 32 (9%)

Charophytes 59 (16%)

Chlorophytes ―

Limitations– The skewed database effect

ASD1 Z

ASD2 Z

ASD3 Z

ASD4 Z

ASD4 X

Input X or Z?

Ongoing Work– Reducing the skewed database effect

• Weighted sequences and phylogenetics

ASD1 Z

ASD2 Z

ASD3 Z

ASD4 Z

ASD4 X

Input X!

Ongoing Work – Making the online resource more appealing

– Making the online resource more user-friendly

Future Directions – Increase diversity in the ASD

– Analyze sequence context using the ASD

– Apply methodology to editing in chloroplasts

– Apply methodology to U to C editing

Thanks – Jeff Palmer

– Sun Kim

– Danny Rice and other members of the Palmer lab

preditor predictive rna editor for plant mitochondrial genes jeff mower

Documents

c n f v g c q

chloroplastpredominantly

c conversionsmechanism

chloroplastboth c

knownrna editing

sequence contextupstream

rnaneed multiple reads

unedited dna sequence