1 ribosomal frameshifting in escherichia

16
7210–7225 Nucleic Acids Research, 2014, Vol. 42, No. 11 doi: 10.1093/nar/gku386 Analysis of tetra- and hepta-nucleotides motifs promoting -1 ribosomal frameshifting in Escherichia coli Virag Sharma 1 , Marie-Franc ¸ oise Pr ` ere 2 , Isabelle Canal 2 , Andrew E. Firth 3 , John F. Atkins 1,4 , Pavel V. Baranov 1 and Olivier Fayet 2,* 1 School of Biochemistry and Cell biology, University College Cork, Cork, Ireland, 2 Laboratoire de Microbiologie et en´ etique mol ´ eculaire, UMR5100, Centre National de la Recherche Scientifique, Universit´ e Paul Sabatier-Toulouse III, 118 route de Narbonne, Toulouse 31062-cedex, France, 3 Department of Pathology, University of Cambridge, Cambridge CB2 1QP, UK and 4 Department of Human Genetics, University of Utah, 15N 2030E, Rm7410, Salt Lake City, UT 84112-5330, USA Received February 28, 2014; Revised April 03, 2014; Accepted April 22, 2014 ABSTRACT Programmed ribosomal -1 frameshifting is a non- standard decoding process occurring when ribo- somes encounter a signal embedded in the mRNA of certain eukaryotic and prokaryotic genes. This signal has a mandatory component, the frameshift motif: it is either a Z ZZN tetramer or a X XXZ ZZN heptamer (where ZZZ and XXX are three identical nucleotides) allowing cognate or near-cognate repairing to the -1 frame of the A site or A and P sites tRNAs. Depending on the signal, the frameshifting frequency can vary over a wide range, from less than 1% to more than 50%. The present study combines experimental and bioinformatics approaches to carry out (i) a system- atic analysis of the frameshift propensity of all pos- sible motifs (16 Z ZZN tetramers and 64 X XXZ ZZN heptamers) in Escherichia coli and (ii) the identifica- tion of genes potentially using this mode of expres- sion amongst 36 Enterobacteriaceae genomes. While motif efficiency varies widely, a major distinctive rule of bacterial -1 frameshifting is that the most efficient motifs are those allowing cognate re-pairing of the A site tRNA from ZZN to ZZZ. The outcome of the ge- nomic search is a set of 69 gene clusters, 59 of which constitute new candidates for functional utilization of -1 frameshifting. INTRODUCTION Programmed ribosomal -1 frameshifting (PRF-1) has been recognized more than 25 years ago as a mode of transla- tional control of specific genes, first in retroviruses (1–3) and later in bacterial genes (4,5). Since then the number of demonstrated or suspected cases, has greatly increased, generally through homology searches, or by taking advan- tage of the many sequenced genomes to look for genes containing potential frameshift signals (6,7). For example, the Recode database (8) has 245 entries for -1 frameshift- ing originating from eukaryotic viruses (192 cases), from transposable elements (31 cases, 6 bacterial and 25 eukary- otic), from bacteriophages (12 cases) and from chromoso- mal genes (10 cases). Overall, 219 entries come from eukary- otic genes. This may give the impression that -1 frameshift- ing is less common in prokaryotes, but it is not necessar- ily true. Analysis of the ISFinder database (9), dedicated to bacterial transposable elements called insertion sequences (IS), showed that more than 500 IS elements very likely use -1 frameshifting to synthesize the proteins necessary for their mobility (10). Another bioinformatics study car- ried out on 973 bacterial genomes revealed more than 5000 genes that probably use -1 frameshifting (11). These genes can be grouped into a limited number of clusters most of which correspond to IS elements. Although, like their eu- karyotic counterparts, most bacterial genes likely using pro- grammed -1 frameshifting are found in mobile elements, such as IS transposons or bacteriophages (12), utilization of -1 frameshifting may not be limited to them. Another study revealed a set of 146 prokaryotic gene families with various potential programmed frameshifts, several of which were experimentally tested (13,14). Most of these families cor- respond to non-mobile genes encoding proteins of known functions and proteins with conserved domains performing yet unknown functions. Sequences of genes from these clus- ters are available from GenTack database (13). The execution of frameshifting at a significant level re- quires a relatively simple signal which is embedded within the coding part of certain mRNAs (Figure 1). The two com- ponents of this signal were revealed by the earlier stud- * To whom correspondence should be addressed. Tel: +33 5 61 33 58 75; Fax: +33 5 61 33 58 86; Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article-abstract/42/11/7210/1451769 by guest on 02 April 2018

Upload: phungdieu

Post on 02-Feb-2017

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 ribosomal frameshifting in Escherichia

7210ndash7225 Nucleic Acids Research 2014 Vol 42 No 11doi 101093nargku386

Analysis of tetra- and hepta-nucleotides motifspromoting -1 ribosomal frameshifting in EscherichiacoliVirag Sharma1 Marie-Francoise Prere2 Isabelle Canal2 Andrew E Firth3 John F Atkins14Pavel V Baranov1 and Olivier Fayet2

1School of Biochemistry and Cell biology University College Cork Cork Ireland 2Laboratoire de Microbiologie etGenetique moleculaire UMR5100 Centre National de la Recherche Scientifique Universite Paul Sabatier-ToulouseIII 118 route de Narbonne Toulouse 31062-cedex France 3Department of Pathology University of CambridgeCambridge CB2 1QP UK and 4Department of Human Genetics University of Utah 15N 2030E Rm7410 Salt LakeCity UT 84112-5330 USA

Received February 28 2014 Revised April 03 2014 Accepted April 22 2014

ABSTRACT

Programmed ribosomal -1 frameshifting is a non-standard decoding process occurring when ribo-somes encounter a signal embedded in the mRNA ofcertain eukaryotic and prokaryotic genes This signalhas a mandatory component the frameshift motif itis either a Z ZZN tetramer or a X XXZ ZZN heptamer(where ZZZ and XXX are three identical nucleotides)allowing cognate or near-cognate repairing to the -1frame of the A site or A and P sites tRNAs Dependingon the signal the frameshifting frequency can varyover a wide range from less than 1 to more than50 The present study combines experimental andbioinformatics approaches to carry out (i) a system-atic analysis of the frameshift propensity of all pos-sible motifs (16 Z ZZN tetramers and 64 X XXZ ZZNheptamers) in Escherichia coli and (ii) the identifica-tion of genes potentially using this mode of expres-sion amongst 36 Enterobacteriaceae genomes Whilemotif efficiency varies widely a major distinctive ruleof bacterial -1 frameshifting is that the most efficientmotifs are those allowing cognate re-pairing of the Asite tRNA from ZZN to ZZZ The outcome of the ge-nomic search is a set of 69 gene clusters 59 of whichconstitute new candidates for functional utilization of-1 frameshifting

INTRODUCTION

Programmed ribosomal -1 frameshifting (PRF-1) has beenrecognized more than 25 years ago as a mode of transla-tional control of specific genes first in retroviruses (1ndash3)and later in bacterial genes (45) Since then the number

of demonstrated or suspected cases has greatly increasedgenerally through homology searches or by taking advan-tage of the many sequenced genomes to look for genescontaining potential frameshift signals (67) For examplethe Recode database (8) has 245 entries for -1 frameshift-ing originating from eukaryotic viruses (192 cases) fromtransposable elements (31 cases 6 bacterial and 25 eukary-otic) from bacteriophages (12 cases) and from chromoso-mal genes (10 cases) Overall 219 entries come from eukary-otic genes This may give the impression that -1 frameshift-ing is less common in prokaryotes but it is not necessar-ily true Analysis of the ISFinder database (9) dedicated tobacterial transposable elements called insertion sequences(IS) showed that more than 500 IS elements very likelyuse -1 frameshifting to synthesize the proteins necessaryfor their mobility (10) Another bioinformatics study car-ried out on 973 bacterial genomes revealed more than 5000genes that probably use -1 frameshifting (11) These genescan be grouped into a limited number of clusters most ofwhich correspond to IS elements Although like their eu-karyotic counterparts most bacterial genes likely using pro-grammed -1 frameshifting are found in mobile elementssuch as IS transposons or bacteriophages (12) utilization of-1 frameshifting may not be limited to them Another studyrevealed a set of 146 prokaryotic gene families with variouspotential programmed frameshifts several of which wereexperimentally tested (1314) Most of these families cor-respond to non-mobile genes encoding proteins of knownfunctions and proteins with conserved domains performingyet unknown functions Sequences of genes from these clus-ters are available from GenTack database (13)

The execution of frameshifting at a significant level re-quires a relatively simple signal which is embedded withinthe coding part of certain mRNAs (Figure 1) The two com-ponents of this signal were revealed by the earlier stud-

To whom correspondence should be addressed Tel +33 5 61 33 58 75 Fax +33 5 61 33 58 86 Email OlivierFayetibcgbiotoulfr

Ccopy The Author(s) 2014 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7211

nnn _nnZ_ZZN_nnn_nnn 1 2 3 4

nnX_XXZ_ZZN_nnn_nnn 1 2 3 4 5 6 7

E P Aopt ional

upstream

st imulator

opt ional

downstream

st imulator

64 motifsShine-Dalgarnosequence

~8-14 nt before P-si te codon

pseudoknot or stem-loop~5-8 nt af ter A-si te codon

16 motifs

Figure 1 Overall organization of known bacterial -1 frameshift signalsCodons in frame with the upstream initiation codon (frame 0) are sepa-rated with underscores their position relative to the ribosomal E P and Asites and their tRNAs at the onset of frameshifting is indicated

ies on retroviruses (1ndash3) The obligatory component is ashort sequence of 7 nucleotides X XXZ ZZN called theframeshift motif or lsquoslipperyrsquo motif where XXX and ZZZare triplets of identical bases and N is any nucleotide (un-derscoring separates codons in frame with the initiationcodon ie frame 0 and dots separate codons in the newframe ie frame -1) Thus there are 64 possible sequencescorresponding to the above definition It was also shownthat an even shorter sequence a Z ZZN tetramer (whereZZZ are three identical bases thus leading to 16 possiblemotifs) could also direct programmed -1 frameshifting (15ndash21) Thelsquoslipperinessrsquo of both types of motifs likely resultsfrom their capacity to allow cognate or near cognate re-pairing in the -1 frame of one or two tRNAs (222) Onan X XXZ ZZN heptamer the XXZ- and ZZN-decodingtRNAs respectively in the P and A sites of the ribosomewould break the codonndashanticodon interaction and re-pairon the XXX and ZZZ codons in the -1 frame The sec-ond component of frameshift signals is a stimulatory ele-ment which by itself cannot induce frameshifting It canbe an RNA secondary structure such as simple or branchedhairpin-type stem-loop (HP) or a pseudoknot (PK) (23ndash25)As illustrated in panels C and D of Figure 2 it is formedby local folding of the mRNA and generally starts 5ndash8 nu-cleotides downstream of the slippery sequence (172326)The stimulatory effect of a structure may be linked to its ca-pacity to block ribosomes transiently when the ZZN codonoccupies the A site and thus give more time to tRNAs for re-pairing (27ndash29) In addition a structure may exert a pullingeffect on the mRNA and favour its realignment within theribosome to bring the XXX and ZZZ -1 frame codons inthe P and A sites (3031) It is present in all well-studiedeukaryotes cases often as a PK but not always found inprokaryotes PRF-1 regions (10) Prokaryotic signals some-times possess another type of stimulatory element upstreamof the frameshift motif a ShinendashDalgarno (SD)-like se-quence normally involved in translation initiation throughpairing with the CCUCC sequence at the 3prime end of 16S ri-bosomal RNA (32ndash34) In PRF-1 signals the same interac-tion occurs but within an elongating ribosome and resultsin a translational pausing (35) which may provide a longer

time window for tRNA re-pairing The other possible effectof a stimulatory SD is linked to its distance from the motifwhich could generate a tension between the mRNA and theribosome that could be resolved by realigning the mRNA(32) Thus the two types of stimulators could act in concertby generating pausing for both and by pushing (SD) orpulling (structure) on the mRNA In addition frameshift-ing frequency in eukaryotes and prokaryotes is modulatedby the immediate context on both sides of the slippery mo-tif (36ndash38) It is not yet clear by which mechanism(s) thismodulation operates but it could result in part from an Esite tRNA effect (38) from intra-mRNA interactions (36)and possibly from mRNAndashribosome interactions within themessage entry tunnel (39)

The slippery motif being the key element in frameshift-ing has been the object of a particular attention An earlystudy dealt with the motif present in the Rous sarcoma virussignal (2) Mutating it to a limited set of different motifslead to the proposal of basic rules governing X XXZ ZZNheptamers frameshifting efficiency for eukaryotes in shortsubstantial frameshifting is attained if XXX = [AAA GGGor UUU] ZZZ = [AAA or UUU] and N = [A C or U]A subsequent nearly systematic analysis confirmed theserules 44 motifs were tested and the 20 remaining motifswere not included because of their expected inefficiency (seeupper panel of Supplementary Figure S1A) (17) Thus onlya subset of the 64 possible X XXZ ZZN heptamers canelicit frameshifting at a significant level in higher eukary-otes

The X XXZ ZZN heptamers were also analysed inprokaryotes using the Escherichia coli bacterium but notas thoroughly as in eukaryotes (193640ndash42) It turned outthat the most proficient motifs the X XXA AAG hep-tamers were the least efficient ones in eukaryotes Con-versely the best eukaryotic motifs proved inefficient in Ecoli (eg A AAA AAC or G GGA AAA) thus indicat-ing major differences in the response of the respective trans-lational machineries to these signals However the limitedscope of these studies in terms of number of motifs testeddid not allow the establishment of precise rules concerningslippery heptamers efficiency in bacteria The first aim ofthe work presented here is to determine these rules by car-rying out a complete functional analysis in E coli of bothtypes of potential frameshift motifs the Z ZZN tetramersand X XXZ ZZN heptamers The second objective is toinvestigate by bioinformatics approaches the prevalence ofthe X XXZ ZZN motifs in 37 enterobacterial genomesmostly from E coli isolates in order to determine whetheror not they have been selected against in coding sequencesbecause of their frameshifting proclivity Our third aim is toidentify genes possibly utilizing -1 programmed frameshift-ing by analysing in the same set of genomes those contain-ing a subset of 21 heptamers which were chosen on the basisof their -1 frameshifting efficiency andor their significantunderrepresentation

MATERIALS AND METHODS

Bacterial strain growth conditions and transposition assay

The E coli K-12 strain JS238 [MC1061 araD (araleu) galU galK hsdS rpsL (lacIOPZYA)X74 malPlacIq

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7212 Nucleic Acids Research 2014 Vol 42 No 11

AUUAUCCUGgcc

agcuUUGAAAUGGAGAAUG-

stop 0

-1frameshiftmotif

SD

0

-AAAUACUXXXZZZNGCUACC

G

UU-AC-GU-AC-GG-CC-GG-C

A-UG-CC

AAU A

A U

G-CC-GUGC-GU-AU-A

A

G-C

A-U

C-G

U-A

G-C

U-A

UC

CA

C

Stem Loop

IS911stimulators

C

frameshift

motif

GCGA

AG C

AC U

U AUUG

AA

AAA

CA

U U

-AAAUACUXXXZZZNGCUAC

U

IS3stimulator

D

agcuuUGAAAUCCUCAAUG-0

-1 GAccggg

G-CC-GC-GU-AG-CA-U

C-GU-AU-AC-GA-UU-AA-UC-GA-UG-C

Pseudoknot

B

agcuuUGAAAUCCUCAAUG-

-AAAUACUXXXZZZNGCUACC-frameshift

motif

-GCGCUCUUGUAAUAGAgggcc

0

-1

no-stimulator

A

lacZ

pOFX310

PTac

ori[pBR322]

bla

-AUGAAAAAACGUAAUUUaagcuugggccc-HindIII ApaI

0 Z0

Figure 2 Reporter plasmid and sequence of the three contexts in which the frameshifting propensity of the X XXZ ZZN heptamers was assessed PlasmidpOFX310 (panel A) was used to clone between a HindIII and an ApaI site the three frameshift windows shown in panels BndashD The no-stimulator construct(panel B) was derived from the IS911 construct (panel C) (48) by deletion of most of the stem-loop and mutation to CCUC of the SD-like GGAG sequenceThe IS3 construct (panel D) was engineered by replacing the IS911 stem-loop with the PK from IS3 (4) and by mutating to CCUC the stimulatory SD

srlCTn10 recA1] was used for all experiments Bacterialcultures were carried out in Luria-Bertani (LB) medium(43) to which Ampicillin (40 mgl) plus oxacillin (200 mgl)were added when necessary

Plasmid constructions for assessing -1 frameshifting

All frameshift cassettes were cloned into the pOFX310reporter (Figure 2A) derived from the pAN127 plas-mid (40) by changing the translation initiation regionof the lacZ gene between the XbaI and HindIII totctagCTCGAGATTTATTGGAATAACATATG AAAAAA CGT AAT TTa agc tt (the XbaI and HindIII sitesare in lowercase the SD sequence GGA and the ATGstart codon in frame 0 are both underlined) Overlap-ping oligonucleotides were inserted between the HindIIIand ApaI sites of the vector to reconstitute the vari-ous frameshift regions in front of the lacZ gene so thatexpression of -galactosidase requires a -1 ribosomalframeshifting event within the cloned cassette For eachtype of frameshift region (ie no stimulator IS911 stim-ulators or IS3 stimulators see Figure 2BndashD) an in-frame

construct was made to serve as 100 reference for calcu-lation of frameshifting frequencies from -galactosidaseactivities A non-shifty derivative was constructed for eachmotif to assess the background level of frameshifting Therationale was to keep the same tRNA in the A site (Z ZZNmotifs) or in both A and P sites (X XXZ ZZN motifs)For the Z ZZN tetramers only the first nucleotide wasmutated to G YYN or C RRN (with Y = [U C] and R =[A G]) For the heptamers the first and fourth nucleotidesof the motifs were changed to give G YYC UUNC RRC UUN G YYU CCN C RRU CCNG YYG AAN C RRG AAN G YYA GGN orC RRA GGN

Measurement of frameshifting frequency by -galactosidaseassay

Transcription of lacZ relies on a strong isopropyl--D-thiogalactopyranoside-inducible pTac promoter Its ex-pression was monitored by a standard colorimetric as-say (43) on cultures prepared either in the absence of in-ducer for constructs with a sufficiently high level of -

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7213

galactosidase activity (ie above 02 frameshifting) orafter isopropyl -D-1 thiogalactopyranoside induction forthe ones with a low activity (ie those with less than 11frameshifting) For each strain 5 tubes containing 05 mlof Luria-Bertani medium (supplemented with ampicillinand oxacillin) were inoculated with independent clonesand incubated overnight at 37C These cultures were ei-ther diluted 1250 in the same medium and incubated270 min at 37C (no-induction conditions) or diluted150 in the same medium plus 2 mM of isopropyl--D-thiogalactopyranoside and incubated 240 min at 37C (in-duction conditions) The dosage conditions were as previ-ously described (19) Note that both methods gave identical frameshifting values in their overlap range ie between02 and 11 We also verified the accuracy of the reportedvalues above or below the overlap range by applying to alimited set of plasmid constructions a refined assay in whichnon-induced cultures were first concentrated and lysed bysonication (data not shown)

Generation and randomization of a non-redundant lsquomostly Ecoli genomersquo (nrMEG)

The Refseq accessions of the genomes that were used toconstruct our nrMEG together with their organismstraininformation are given in Supplementary Table S1 All theprotein coding gene sequences were extracted from theffn files (National Center for Biotechnological Informationwebsite ftpftpncbinlmnihgovgenomesBacteria) ofthese 37 accessions and merged together to make a com-bined genome of 169 302 sequences These sequences wereclustered using the BLASTCLUST program using a 95sequence identity threshold at the level of nucleotide se-quence One representative sequence per cluster was ran-domly chosen to constitute an nrMEG of 22 703 sequencesEach sequence from the nrMEG was randomized 1000times using the Dicodonshuffle randomization procedure(44) to yield 1000 randomized nrMEGs The DicodonShuf-fle algorithm preserves the dinucleotide composition theencoded protein sequence and the codon usage of each gene

Analysis of XXXZZZN frequencies in protein coding se-quences

A customized perl script was used to count the occur-rences of a pattern in all three possible reading frames(ie X XXZ ZZN XX XZZ ZN and XXX ZZZ N) inboth the real and randomized nrMEGs Violin plots (45)generated with the vioplot package from the R softwarelibrary (httpwwwr-projectorg) were used to visualizethe occurrences of the 64 X XXZ ZZN patterns Z-scoreswere computed as follows z-score = (x minus xmean)xsd wherex is the frequency of occurrence of a pattern in the inte-grated genome xmean is the mean of the distribution of thesame pattern across 1000 randomized genomes and xsd isthe standard deviation of the distribution of the same pat-tern across 1000 randomized genomes Z-scores for the 64XXXZZZN in all three frames are shown in Supplemen-tary Table S3 while all violin plots are available online athttp laptiuccieheptameric patterns clusters

Clustering of genes containing selected X XXZ ZZN pat-terns

All the annotated protein coding genes from 36 of the 37genomes listed in Supplementary Table S1 (AC 000091 waslater excluded because of its removal from Refseq) werescreened for the presence of a motif from a set of selected21 X XXZ ZZN patterns (see Results section) These se-quences were clustered based on similarity between the en-coded protein sequences using the BLASTCLUST program(sequence identity threshold = 45) A total of 658 clusterswhich had at least 20 sequences and where the heptamericpattern was perfectly conserved were taken up for furtheranalysis The coordinates of the conserved heptameric pat-terns were also recorded for each cluster

However these clusters contain only sequences of pro-tein coding genes from those 36 genomes which were ini-tially selected to constitute the nrMEG These clusters wereenriched with additional homologous sequences from thegenomes not included in the nrMEG in an attempt to ob-tain a better phylogenetic signal The enrichment was car-ried out using a tblastn search against all bacterial se-quences in the nr database as described previously (11)The lsquonewlyrsquo obtained homologous sequences for each clus-ter were aligned by translating them into protein sequencesaligning these protein sequences and then back translatingthe aligned protein sequences to their corresponding nu-cleotide sequences The coordinates of the conserved hep-tameric pattern for each cluster (which were recorded in theprevious step) were recalculated to account for gaps intro-duced during alignment of additional sequences

Identification of clusters with conserved heptameric pattern

For the 658 clusters identified in the previous step we em-ployed an additional filtering procedure to identify thoseclusters where the heptameric pattern is conserved For eachcluster the total number of sequences was referred to asNall The number of sequences where the heptameric pat-tern was the same as the parent pattern was referred to asN1 The number of sequences where the pattern is not thesame but is one of the 64 X XXY YYZ patterns was re-ferred to as N2 Finally it has been previously observed(eg in dnaX) that the position of the frameshift site maynot be perfectly conserved To account for that possibilitya 36-nt window starting from the coordinate which is 15nt upstream of the conserved heptameric coordinate wasalso screened for the presence of any X XXY YYZ pat-tern the number of sequences in that category was referredto as N3 For each cluster the three values were summed up(ie N1+N2+N3 = Nsum) and the ratio NsumNall was calcu-lated Clusters where NsumNall gt 09 were labelled as lsquocon-servedrsquo In an ideal situation the frameshift pattern shouldbe absolutely conserved but this threshold was relaxed soas to allow for the possibility of sequencing errors or recentmutations in the sequences from a cluster The end resultwas a set of 69 clusters (features of these clusters are pre-sented in Supplementary Tables S4ndashS8 and the completesequence of their genes and other features are available athttplaptiuccieheptameric patterns clusters)

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7214 Nucleic Acids Research 2014 Vol 42 No 11

Synonymous-site conservation analyses

The degree of conservation at synonymous sites was cal-culated as previously described (46) for a 15-codon win-dow The detailed results of this analysis are available on-line at http laptiuccieheptameric patterns clusters anda summary is included in the RSSV column (for reducedvariability at synonymous sites) of Supplementary TablesS4 and S5 However a statistically significant conservationat synonymous sites can only be observed if there is suffi-cient sequence divergence in the alignment To numericallyquantify sequence divergence we also calculated a statisticcalled alndiv which corresponds to an estimate of the meannumber of phylogenetically independent nucleotide substi-tutions per alignment column (see Supplementary Tables S4and S5)

Search of potential stimulatory elements flanking the pattern

Two types of potential frameshift stimulators weresearched The first type was an SD-like sequence (eitherGAGG GGAG AGGA GGNGG or AGGKG with K= [TG]) located 6ndash17 nt before the second base of themotif these sequences and the spacing interval were chosenbecause all were experimentally proved to be stimulatory(32) A segment of 30 nt ending with the first base of themotif was scanned using a script for Perl (version 5101from ActiveState) in all the sequences of each cluster forthe above potential SD sequences A cluster qualified ashaving a lsquoconserved SDrsquo if at least 50 of its sequences hadan SD

The second type of stimulator was an RNA structuredownstream of the motif A preliminary study of 271 IS3family members (see Supplementary Figure S2) led us tochoose the following empirical rules the structure is (i) asimple or branched hairpin of a length ranging from 17to 140 nt (ii) that starts 4ndash10 nt after the last base of themotif (iii) with a G-C(or C-G) base-pair followed by at leastthree consecutive WatsonndashCrick or GndashU or UndashG base-pairsand (iv) has a Gunfold37C ge 76 kcalmolminus1 the G37Cvalue was determined using the default parameters of theRNAfold program from version 185 of the Vienna RNApackage (47) We limited our search to hairpin structuresand and did not explore whether some of the structurescould also form PKs at this preliminary stage For each se-quence of all clusters a 197 nt segment starting at the fourthbase after the motif was extracted and analysed with a cus-tom Perl script Each segment was first deleted from the 3primeend one base at a time and down to 17 nt Each set of nesteddeletions was passed to the RNAfold program and the po-tential structures were sorted out to retain those conformingto the rules For a given cluster structures were grouped intypes according to hairpin size and distance from the mo-tif The frequency of each type of structure was calculatedand only those present in at least 50 of the sequences ofthe cluster were retained 53 of the 69 clusters had sucha conserved structure The Ghpntminus1 parameter (ie theGunfold value divided by the number of nucleotides in thehairpin structure) was calculated For clusters with severaltypes of structure the lsquobestrsquo type was the one with the high-est value for the [(Ghpntminus1) x (frequency)] product A

summary of these analyses is reported in SupplementaryTables S4 S5 and S8

For further comparison 20 ISs from the IS3 family wereselected because they contain a frameshift motif followedby a known (or likely) stimulatory structure of a size rang-ing from 17 to 131 nt The Ghp and Ghpntminus1 param-eters were calculated for each structure Selective pressureto maintain a hairpin should likely result in a region morestructured than neighbouring regions of the same size iehaving a value higher than average for the Gntminus1 param-eter To assess that a 197 nt segment starting 4 nt down-stream of the frameshift motif was extracted for each ofthe 20 ISs as well as for one typical sequence of each ofthe 53 clusters with a conserved structure For each 197 ntsegment a Perl script generated a subset of sequences bymoving (1 nt at a time) a sliding window of the size of thecorresponding conserved hairpin and passed it to RNAfoldThe average Gunfoldnt-1 (Gavntminus1) were calculated foreach subset as well as the Gntminus1 which is the difference[(Ghpntminus1) minus (Gavntminus1)] As expected the 20 ISs pos-sessing a stimulatory structure all display a higher than av-erage Gntminus1 (Supplementary Table S8)

RESULTS

In vivo determination of -1 frameshifting frequency

As illustrated in Figure 2 the 16 Z ZZN motifs and the64 X XXZ ZZN motifs were cloned either without flank-ing stimulators or with a strong downstream stimulator de-rived from the IS3 PK (1819) or for the heptamers onlywith the moderately efficient combination of upstream anddownstream stimulators from IS911 (3248) For both typesof motifs a non-shifty derivative was similarly cloned forthat the first base of each tetramer or the first and fourthbases of each heptamer were mutated The shifty and non-shifty cassettes were inserted in front of the lacZ gene car-ried by plasmid pOFX310 so that translation of full-length-galactosidase occurs only when ribosomes move to the -1frame before encountering the 0 frame stop codon (Figure2)

Frameshift propensity of the Z ZZN motifs

The graphs showing the variation of -1 frameshifting fre-quency as a function of the sequence of the motif arepresented in Figure 3 for the Z ZZN tetramers (see alsoSupplementary Table S2) The motif-containing constructswithout PK (save G GGG see below) were on the aver-age marginally above their no-motif counterpart (0147 plusmn0004 versus 0112 plusmn 0015) which suggests that the mo-tifs are by themselves barely or not at all shifty Addition ofthe IS3 PK led to substantial increase in frameshifting fre-quency for 10 motifs Only six were at least four times abovebackground (ie above 0064 plusmn 0014) with frequenciesranging from 026 to 56 For them the hierarchy was[A AAG gtgt U UUC gt U UUU gt C CCU = C CCCgtA AAA] Four motifs (U UUA U UUG C CCA andC CCG) were 18- to 36-fold above background

A few oddities were revealed The G GGG C GGAand C GGG constructs (with and without PK) were found

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7215

motif amp PKIS3

no-motif amp PKIS3

motif amp no-stimulator

no-motif amp no-stimulator

U_UUU C A G

C_CC

U C A G

G_GGU C A G

A_AAU C A G

001

01

1

10

100

-1

fra

me

sh

iftin

g

Figure 3 Frameshift efficiency of the Z ZZN tetramers The IS3frameshift region cloned in plasmid pOFX310 was the one used in a previ-ous study [see Figure 1 in (19)] It differs slightly from the one used for theheptamer analysis (Figure 2 panel D) The nucleotides upstream (6 nt) anddownstream (5 nt) of the motif are those found in IS3 The sequence fromthe HindIII site to the start of the PK is agcuuCCUCCAZZZNGCCGCndashndash The no-stimulator construct was derived by deleting the 3prime half of thePK right after the UGA stop codon in the 0 frame to give the follow-ing sequence agcuuCCUCCAZZZNGCCGCGACAUACUUCGCGAAGGCCUGAACUUGAAgggcc The four frameshifting values for eachmotif correspond to a construct with a motif and the IS3 PK (open circles)a construct without motif and with the IS3 PK (open lozenges) a con-struct with motif and without stimulator (black inverted triangles) and aconstruct without motif and stimulator (open triangles) Each frameshift-ing value is the mean of five independent determinations (the plusmn standarddeviation intervals were omitted because they are not bigger than the sizeof the symbols in most cases) The no-motif constructs were derived bychanging each motif to either G YYN or C RRN

above background probably not as a result of frameshift-ing but because these sequences together with the followingG could act as SD sequences and direct low level initiationon the -1 frame AUA codon present 7 nt downstream (seelegend of Figure 3) The other oddity C AAG was nearly10 times above background (093) but only when the PKstimulator was present this was likely due to -1 frameshift-ing caused by the high shiftiness of the lysyl-tRNAUUU(4149) combined to the high efficiency of the PK stimula-tor

Frameshift propensity of the X XXZ ZZN motifs

The results obtained for the X XXZ ZZN heptamersand their non-shifty derivatives are presented in Figure4 The average background values given by the no-motifconstructs was 0037 plusmn 0010 for those with the IS911stimulators (open lozenges) and 0055 plusmn 0031 for thosewithout stimulator (open triangles) The estimated back-ground value for the IS3 constructs was of 0069 plusmn 0007(data not shown) The frameshifting frequency amongthe motif-containing constructs varied over a large rangeie from 0018 for G GGG GGU without stimulatorto 54 for C CCA AAG associated with the IS3 PKAs previously demonstrated in E coli the most efficient

motifs in the presence of the IS911 or IS3 stimulatorswere C CCA AAG G GGA AAG and A AAA AAG(194041) The ratio between the motif and no-motifframeshifting frequency values was used as a classifier ofmotif efficiency motifs displaying a ratio above 2 were cat-egorized as frameshift-prone Among the constructs with-out stimulator 39 motifs (61) met this criterion (ratiofrom 21 to 10) When the IS911 stimulators were added55 motifs (86) showed a ratio ranging from 21 to 112Swapping the moderate IS911 stimulators for the more effi-cient IS3 PK increased further the motif to no-motif ratio(from 21 to 1188) and raised the number of positive mo-tifs to 61 (953) the 3 motifs below the threshold wereA AAC CCA G GGC CCA and C CCG GGC

In spite of divergences as to its timing in the elongationcycle the general view concerning -1 frameshifting on slip-pery heptamers is that it occurs after proper decoding ofthe ZZN 0-frame codon when the P and A ribosomal sitesare occupied by the XXZ and ZZN codons and their cog-nate tRNAs (22230405051) Simple rules emerge whenthe effect of the ZZN codon is considered (Figure 4) TheUUN and AAN codons are on the average more frameshift-prone than CCN and GGN Whatever Z is the two ho-mogeneous ZZN codons (meaning all purine or all pyrim-idine bases) are better shifters than the two correspondingheterogeneous ones To explain further all the variations inframeshifting frequency it is also necessary to take into ac-count the nature of the X nucleotide Motifs are by andlarge more frameshift-prone when the X and Z nucleotidesare homogeneous ie all purines or all pyrimidines No-table exceptions to the latter rule are Y YYA AAR andR RRU UUY (with Y = [U C] R = [A G]) Here thehigh shiftiness of the AAR and UUY codons probablycounteracts the negative effect of the YYA and RRU het-erogenous codons

Distribution of frameshift motifs in IS elements

From the above experimental study we concluded that amajority of heptamers (and nearly half of the tetramers)were capable of eliciting -1 frameshifting at substantiallevels (at least twice the background level) To determinethe range of motifs used in genes utilizing frameshift-ing for their expression we carried out an analysis ofIS mobile genetics elements known or suspected to usethis mode of translational control We focused on themembers of the IS1 and IS3 families available in theISFinder database in October 2012 (9) As shown inFigure 5 both tetramers and heptamers are found butwith a marked preference for heptamers (87 against 403)Among the five tetramers the three most shift-prone mo-tifs A AAG and U UU[UC] predominate and the lessefficient A AAA motif is also well represented Only 16different heptamers are found with 72 of them being ei-ther A AAA AAG or A AAA AAA The next most fre-quent are A AAA AAC and G GGA AAC both of lowefficiency followed by the more efficient G GGA AAGU UUU UUC and G GGA AAA To conclude it ap-pears that genes known or suspected to use PRF-1 to ex-press a biologically important protein do not necessarily uti-lize high efficiency motifs However this conclusion is based

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 2: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7211

nnn _nnZ_ZZN_nnn_nnn 1 2 3 4

nnX_XXZ_ZZN_nnn_nnn 1 2 3 4 5 6 7

E P Aopt ional

upstream

st imulator

opt ional

downstream

st imulator

64 motifsShine-Dalgarnosequence

~8-14 nt before P-si te codon

pseudoknot or stem-loop~5-8 nt af ter A-si te codon

16 motifs

Figure 1 Overall organization of known bacterial -1 frameshift signalsCodons in frame with the upstream initiation codon (frame 0) are sepa-rated with underscores their position relative to the ribosomal E P and Asites and their tRNAs at the onset of frameshifting is indicated

ies on retroviruses (1ndash3) The obligatory component is ashort sequence of 7 nucleotides X XXZ ZZN called theframeshift motif or lsquoslipperyrsquo motif where XXX and ZZZare triplets of identical bases and N is any nucleotide (un-derscoring separates codons in frame with the initiationcodon ie frame 0 and dots separate codons in the newframe ie frame -1) Thus there are 64 possible sequencescorresponding to the above definition It was also shownthat an even shorter sequence a Z ZZN tetramer (whereZZZ are three identical bases thus leading to 16 possiblemotifs) could also direct programmed -1 frameshifting (15ndash21) Thelsquoslipperinessrsquo of both types of motifs likely resultsfrom their capacity to allow cognate or near cognate re-pairing in the -1 frame of one or two tRNAs (222) Onan X XXZ ZZN heptamer the XXZ- and ZZN-decodingtRNAs respectively in the P and A sites of the ribosomewould break the codonndashanticodon interaction and re-pairon the XXX and ZZZ codons in the -1 frame The sec-ond component of frameshift signals is a stimulatory ele-ment which by itself cannot induce frameshifting It canbe an RNA secondary structure such as simple or branchedhairpin-type stem-loop (HP) or a pseudoknot (PK) (23ndash25)As illustrated in panels C and D of Figure 2 it is formedby local folding of the mRNA and generally starts 5ndash8 nu-cleotides downstream of the slippery sequence (172326)The stimulatory effect of a structure may be linked to its ca-pacity to block ribosomes transiently when the ZZN codonoccupies the A site and thus give more time to tRNAs for re-pairing (27ndash29) In addition a structure may exert a pullingeffect on the mRNA and favour its realignment within theribosome to bring the XXX and ZZZ -1 frame codons inthe P and A sites (3031) It is present in all well-studiedeukaryotes cases often as a PK but not always found inprokaryotes PRF-1 regions (10) Prokaryotic signals some-times possess another type of stimulatory element upstreamof the frameshift motif a ShinendashDalgarno (SD)-like se-quence normally involved in translation initiation throughpairing with the CCUCC sequence at the 3prime end of 16S ri-bosomal RNA (32ndash34) In PRF-1 signals the same interac-tion occurs but within an elongating ribosome and resultsin a translational pausing (35) which may provide a longer

time window for tRNA re-pairing The other possible effectof a stimulatory SD is linked to its distance from the motifwhich could generate a tension between the mRNA and theribosome that could be resolved by realigning the mRNA(32) Thus the two types of stimulators could act in concertby generating pausing for both and by pushing (SD) orpulling (structure) on the mRNA In addition frameshift-ing frequency in eukaryotes and prokaryotes is modulatedby the immediate context on both sides of the slippery mo-tif (36ndash38) It is not yet clear by which mechanism(s) thismodulation operates but it could result in part from an Esite tRNA effect (38) from intra-mRNA interactions (36)and possibly from mRNAndashribosome interactions within themessage entry tunnel (39)

The slippery motif being the key element in frameshift-ing has been the object of a particular attention An earlystudy dealt with the motif present in the Rous sarcoma virussignal (2) Mutating it to a limited set of different motifslead to the proposal of basic rules governing X XXZ ZZNheptamers frameshifting efficiency for eukaryotes in shortsubstantial frameshifting is attained if XXX = [AAA GGGor UUU] ZZZ = [AAA or UUU] and N = [A C or U]A subsequent nearly systematic analysis confirmed theserules 44 motifs were tested and the 20 remaining motifswere not included because of their expected inefficiency (seeupper panel of Supplementary Figure S1A) (17) Thus onlya subset of the 64 possible X XXZ ZZN heptamers canelicit frameshifting at a significant level in higher eukary-otes

The X XXZ ZZN heptamers were also analysed inprokaryotes using the Escherichia coli bacterium but notas thoroughly as in eukaryotes (193640ndash42) It turned outthat the most proficient motifs the X XXA AAG hep-tamers were the least efficient ones in eukaryotes Con-versely the best eukaryotic motifs proved inefficient in Ecoli (eg A AAA AAC or G GGA AAA) thus indicat-ing major differences in the response of the respective trans-lational machineries to these signals However the limitedscope of these studies in terms of number of motifs testeddid not allow the establishment of precise rules concerningslippery heptamers efficiency in bacteria The first aim ofthe work presented here is to determine these rules by car-rying out a complete functional analysis in E coli of bothtypes of potential frameshift motifs the Z ZZN tetramersand X XXZ ZZN heptamers The second objective is toinvestigate by bioinformatics approaches the prevalence ofthe X XXZ ZZN motifs in 37 enterobacterial genomesmostly from E coli isolates in order to determine whetheror not they have been selected against in coding sequencesbecause of their frameshifting proclivity Our third aim is toidentify genes possibly utilizing -1 programmed frameshift-ing by analysing in the same set of genomes those contain-ing a subset of 21 heptamers which were chosen on the basisof their -1 frameshifting efficiency andor their significantunderrepresentation

MATERIALS AND METHODS

Bacterial strain growth conditions and transposition assay

The E coli K-12 strain JS238 [MC1061 araD (araleu) galU galK hsdS rpsL (lacIOPZYA)X74 malPlacIq

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7212 Nucleic Acids Research 2014 Vol 42 No 11

AUUAUCCUGgcc

agcuUUGAAAUGGAGAAUG-

stop 0

-1frameshiftmotif

SD

0

-AAAUACUXXXZZZNGCUACC

G

UU-AC-GU-AC-GG-CC-GG-C

A-UG-CC

AAU A

A U

G-CC-GUGC-GU-AU-A

A

G-C

A-U

C-G

U-A

G-C

U-A

UC

CA

C

Stem Loop

IS911stimulators

C

frameshift

motif

GCGA

AG C

AC U

U AUUG

AA

AAA

CA

U U

-AAAUACUXXXZZZNGCUAC

U

IS3stimulator

D

agcuuUGAAAUCCUCAAUG-0

-1 GAccggg

G-CC-GC-GU-AG-CA-U

C-GU-AU-AC-GA-UU-AA-UC-GA-UG-C

Pseudoknot

B

agcuuUGAAAUCCUCAAUG-

-AAAUACUXXXZZZNGCUACC-frameshift

motif

-GCGCUCUUGUAAUAGAgggcc

0

-1

no-stimulator

A

lacZ

pOFX310

PTac

ori[pBR322]

bla

-AUGAAAAAACGUAAUUUaagcuugggccc-HindIII ApaI

0 Z0

Figure 2 Reporter plasmid and sequence of the three contexts in which the frameshifting propensity of the X XXZ ZZN heptamers was assessed PlasmidpOFX310 (panel A) was used to clone between a HindIII and an ApaI site the three frameshift windows shown in panels BndashD The no-stimulator construct(panel B) was derived from the IS911 construct (panel C) (48) by deletion of most of the stem-loop and mutation to CCUC of the SD-like GGAG sequenceThe IS3 construct (panel D) was engineered by replacing the IS911 stem-loop with the PK from IS3 (4) and by mutating to CCUC the stimulatory SD

srlCTn10 recA1] was used for all experiments Bacterialcultures were carried out in Luria-Bertani (LB) medium(43) to which Ampicillin (40 mgl) plus oxacillin (200 mgl)were added when necessary

Plasmid constructions for assessing -1 frameshifting

All frameshift cassettes were cloned into the pOFX310reporter (Figure 2A) derived from the pAN127 plas-mid (40) by changing the translation initiation regionof the lacZ gene between the XbaI and HindIII totctagCTCGAGATTTATTGGAATAACATATG AAAAAA CGT AAT TTa agc tt (the XbaI and HindIII sitesare in lowercase the SD sequence GGA and the ATGstart codon in frame 0 are both underlined) Overlap-ping oligonucleotides were inserted between the HindIIIand ApaI sites of the vector to reconstitute the vari-ous frameshift regions in front of the lacZ gene so thatexpression of -galactosidase requires a -1 ribosomalframeshifting event within the cloned cassette For eachtype of frameshift region (ie no stimulator IS911 stim-ulators or IS3 stimulators see Figure 2BndashD) an in-frame

construct was made to serve as 100 reference for calcu-lation of frameshifting frequencies from -galactosidaseactivities A non-shifty derivative was constructed for eachmotif to assess the background level of frameshifting Therationale was to keep the same tRNA in the A site (Z ZZNmotifs) or in both A and P sites (X XXZ ZZN motifs)For the Z ZZN tetramers only the first nucleotide wasmutated to G YYN or C RRN (with Y = [U C] and R =[A G]) For the heptamers the first and fourth nucleotidesof the motifs were changed to give G YYC UUNC RRC UUN G YYU CCN C RRU CCNG YYG AAN C RRG AAN G YYA GGN orC RRA GGN

Measurement of frameshifting frequency by -galactosidaseassay

Transcription of lacZ relies on a strong isopropyl--D-thiogalactopyranoside-inducible pTac promoter Its ex-pression was monitored by a standard colorimetric as-say (43) on cultures prepared either in the absence of in-ducer for constructs with a sufficiently high level of -

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7213

galactosidase activity (ie above 02 frameshifting) orafter isopropyl -D-1 thiogalactopyranoside induction forthe ones with a low activity (ie those with less than 11frameshifting) For each strain 5 tubes containing 05 mlof Luria-Bertani medium (supplemented with ampicillinand oxacillin) were inoculated with independent clonesand incubated overnight at 37C These cultures were ei-ther diluted 1250 in the same medium and incubated270 min at 37C (no-induction conditions) or diluted150 in the same medium plus 2 mM of isopropyl--D-thiogalactopyranoside and incubated 240 min at 37C (in-duction conditions) The dosage conditions were as previ-ously described (19) Note that both methods gave identical frameshifting values in their overlap range ie between02 and 11 We also verified the accuracy of the reportedvalues above or below the overlap range by applying to alimited set of plasmid constructions a refined assay in whichnon-induced cultures were first concentrated and lysed bysonication (data not shown)

Generation and randomization of a non-redundant lsquomostly Ecoli genomersquo (nrMEG)

The Refseq accessions of the genomes that were used toconstruct our nrMEG together with their organismstraininformation are given in Supplementary Table S1 All theprotein coding gene sequences were extracted from theffn files (National Center for Biotechnological Informationwebsite ftpftpncbinlmnihgovgenomesBacteria) ofthese 37 accessions and merged together to make a com-bined genome of 169 302 sequences These sequences wereclustered using the BLASTCLUST program using a 95sequence identity threshold at the level of nucleotide se-quence One representative sequence per cluster was ran-domly chosen to constitute an nrMEG of 22 703 sequencesEach sequence from the nrMEG was randomized 1000times using the Dicodonshuffle randomization procedure(44) to yield 1000 randomized nrMEGs The DicodonShuf-fle algorithm preserves the dinucleotide composition theencoded protein sequence and the codon usage of each gene

Analysis of XXXZZZN frequencies in protein coding se-quences

A customized perl script was used to count the occur-rences of a pattern in all three possible reading frames(ie X XXZ ZZN XX XZZ ZN and XXX ZZZ N) inboth the real and randomized nrMEGs Violin plots (45)generated with the vioplot package from the R softwarelibrary (httpwwwr-projectorg) were used to visualizethe occurrences of the 64 X XXZ ZZN patterns Z-scoreswere computed as follows z-score = (x minus xmean)xsd wherex is the frequency of occurrence of a pattern in the inte-grated genome xmean is the mean of the distribution of thesame pattern across 1000 randomized genomes and xsd isthe standard deviation of the distribution of the same pat-tern across 1000 randomized genomes Z-scores for the 64XXXZZZN in all three frames are shown in Supplemen-tary Table S3 while all violin plots are available online athttp laptiuccieheptameric patterns clusters

Clustering of genes containing selected X XXZ ZZN pat-terns

All the annotated protein coding genes from 36 of the 37genomes listed in Supplementary Table S1 (AC 000091 waslater excluded because of its removal from Refseq) werescreened for the presence of a motif from a set of selected21 X XXZ ZZN patterns (see Results section) These se-quences were clustered based on similarity between the en-coded protein sequences using the BLASTCLUST program(sequence identity threshold = 45) A total of 658 clusterswhich had at least 20 sequences and where the heptamericpattern was perfectly conserved were taken up for furtheranalysis The coordinates of the conserved heptameric pat-terns were also recorded for each cluster

However these clusters contain only sequences of pro-tein coding genes from those 36 genomes which were ini-tially selected to constitute the nrMEG These clusters wereenriched with additional homologous sequences from thegenomes not included in the nrMEG in an attempt to ob-tain a better phylogenetic signal The enrichment was car-ried out using a tblastn search against all bacterial se-quences in the nr database as described previously (11)The lsquonewlyrsquo obtained homologous sequences for each clus-ter were aligned by translating them into protein sequencesaligning these protein sequences and then back translatingthe aligned protein sequences to their corresponding nu-cleotide sequences The coordinates of the conserved hep-tameric pattern for each cluster (which were recorded in theprevious step) were recalculated to account for gaps intro-duced during alignment of additional sequences

Identification of clusters with conserved heptameric pattern

For the 658 clusters identified in the previous step we em-ployed an additional filtering procedure to identify thoseclusters where the heptameric pattern is conserved For eachcluster the total number of sequences was referred to asNall The number of sequences where the heptameric pat-tern was the same as the parent pattern was referred to asN1 The number of sequences where the pattern is not thesame but is one of the 64 X XXY YYZ patterns was re-ferred to as N2 Finally it has been previously observed(eg in dnaX) that the position of the frameshift site maynot be perfectly conserved To account for that possibilitya 36-nt window starting from the coordinate which is 15nt upstream of the conserved heptameric coordinate wasalso screened for the presence of any X XXY YYZ pat-tern the number of sequences in that category was referredto as N3 For each cluster the three values were summed up(ie N1+N2+N3 = Nsum) and the ratio NsumNall was calcu-lated Clusters where NsumNall gt 09 were labelled as lsquocon-servedrsquo In an ideal situation the frameshift pattern shouldbe absolutely conserved but this threshold was relaxed soas to allow for the possibility of sequencing errors or recentmutations in the sequences from a cluster The end resultwas a set of 69 clusters (features of these clusters are pre-sented in Supplementary Tables S4ndashS8 and the completesequence of their genes and other features are available athttplaptiuccieheptameric patterns clusters)

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7214 Nucleic Acids Research 2014 Vol 42 No 11

Synonymous-site conservation analyses

The degree of conservation at synonymous sites was cal-culated as previously described (46) for a 15-codon win-dow The detailed results of this analysis are available on-line at http laptiuccieheptameric patterns clusters anda summary is included in the RSSV column (for reducedvariability at synonymous sites) of Supplementary TablesS4 and S5 However a statistically significant conservationat synonymous sites can only be observed if there is suffi-cient sequence divergence in the alignment To numericallyquantify sequence divergence we also calculated a statisticcalled alndiv which corresponds to an estimate of the meannumber of phylogenetically independent nucleotide substi-tutions per alignment column (see Supplementary Tables S4and S5)

Search of potential stimulatory elements flanking the pattern

Two types of potential frameshift stimulators weresearched The first type was an SD-like sequence (eitherGAGG GGAG AGGA GGNGG or AGGKG with K= [TG]) located 6ndash17 nt before the second base of themotif these sequences and the spacing interval were chosenbecause all were experimentally proved to be stimulatory(32) A segment of 30 nt ending with the first base of themotif was scanned using a script for Perl (version 5101from ActiveState) in all the sequences of each cluster forthe above potential SD sequences A cluster qualified ashaving a lsquoconserved SDrsquo if at least 50 of its sequences hadan SD

The second type of stimulator was an RNA structuredownstream of the motif A preliminary study of 271 IS3family members (see Supplementary Figure S2) led us tochoose the following empirical rules the structure is (i) asimple or branched hairpin of a length ranging from 17to 140 nt (ii) that starts 4ndash10 nt after the last base of themotif (iii) with a G-C(or C-G) base-pair followed by at leastthree consecutive WatsonndashCrick or GndashU or UndashG base-pairsand (iv) has a Gunfold37C ge 76 kcalmolminus1 the G37Cvalue was determined using the default parameters of theRNAfold program from version 185 of the Vienna RNApackage (47) We limited our search to hairpin structuresand and did not explore whether some of the structurescould also form PKs at this preliminary stage For each se-quence of all clusters a 197 nt segment starting at the fourthbase after the motif was extracted and analysed with a cus-tom Perl script Each segment was first deleted from the 3primeend one base at a time and down to 17 nt Each set of nesteddeletions was passed to the RNAfold program and the po-tential structures were sorted out to retain those conformingto the rules For a given cluster structures were grouped intypes according to hairpin size and distance from the mo-tif The frequency of each type of structure was calculatedand only those present in at least 50 of the sequences ofthe cluster were retained 53 of the 69 clusters had sucha conserved structure The Ghpntminus1 parameter (ie theGunfold value divided by the number of nucleotides in thehairpin structure) was calculated For clusters with severaltypes of structure the lsquobestrsquo type was the one with the high-est value for the [(Ghpntminus1) x (frequency)] product A

summary of these analyses is reported in SupplementaryTables S4 S5 and S8

For further comparison 20 ISs from the IS3 family wereselected because they contain a frameshift motif followedby a known (or likely) stimulatory structure of a size rang-ing from 17 to 131 nt The Ghp and Ghpntminus1 param-eters were calculated for each structure Selective pressureto maintain a hairpin should likely result in a region morestructured than neighbouring regions of the same size iehaving a value higher than average for the Gntminus1 param-eter To assess that a 197 nt segment starting 4 nt down-stream of the frameshift motif was extracted for each ofthe 20 ISs as well as for one typical sequence of each ofthe 53 clusters with a conserved structure For each 197 ntsegment a Perl script generated a subset of sequences bymoving (1 nt at a time) a sliding window of the size of thecorresponding conserved hairpin and passed it to RNAfoldThe average Gunfoldnt-1 (Gavntminus1) were calculated foreach subset as well as the Gntminus1 which is the difference[(Ghpntminus1) minus (Gavntminus1)] As expected the 20 ISs pos-sessing a stimulatory structure all display a higher than av-erage Gntminus1 (Supplementary Table S8)

RESULTS

In vivo determination of -1 frameshifting frequency

As illustrated in Figure 2 the 16 Z ZZN motifs and the64 X XXZ ZZN motifs were cloned either without flank-ing stimulators or with a strong downstream stimulator de-rived from the IS3 PK (1819) or for the heptamers onlywith the moderately efficient combination of upstream anddownstream stimulators from IS911 (3248) For both typesof motifs a non-shifty derivative was similarly cloned forthat the first base of each tetramer or the first and fourthbases of each heptamer were mutated The shifty and non-shifty cassettes were inserted in front of the lacZ gene car-ried by plasmid pOFX310 so that translation of full-length-galactosidase occurs only when ribosomes move to the -1frame before encountering the 0 frame stop codon (Figure2)

Frameshift propensity of the Z ZZN motifs

The graphs showing the variation of -1 frameshifting fre-quency as a function of the sequence of the motif arepresented in Figure 3 for the Z ZZN tetramers (see alsoSupplementary Table S2) The motif-containing constructswithout PK (save G GGG see below) were on the aver-age marginally above their no-motif counterpart (0147 plusmn0004 versus 0112 plusmn 0015) which suggests that the mo-tifs are by themselves barely or not at all shifty Addition ofthe IS3 PK led to substantial increase in frameshifting fre-quency for 10 motifs Only six were at least four times abovebackground (ie above 0064 plusmn 0014) with frequenciesranging from 026 to 56 For them the hierarchy was[A AAG gtgt U UUC gt U UUU gt C CCU = C CCCgtA AAA] Four motifs (U UUA U UUG C CCA andC CCG) were 18- to 36-fold above background

A few oddities were revealed The G GGG C GGAand C GGG constructs (with and without PK) were found

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7215

motif amp PKIS3

no-motif amp PKIS3

motif amp no-stimulator

no-motif amp no-stimulator

U_UUU C A G

C_CC

U C A G

G_GGU C A G

A_AAU C A G

001

01

1

10

100

-1

fra

me

sh

iftin

g

Figure 3 Frameshift efficiency of the Z ZZN tetramers The IS3frameshift region cloned in plasmid pOFX310 was the one used in a previ-ous study [see Figure 1 in (19)] It differs slightly from the one used for theheptamer analysis (Figure 2 panel D) The nucleotides upstream (6 nt) anddownstream (5 nt) of the motif are those found in IS3 The sequence fromthe HindIII site to the start of the PK is agcuuCCUCCAZZZNGCCGCndashndash The no-stimulator construct was derived by deleting the 3prime half of thePK right after the UGA stop codon in the 0 frame to give the follow-ing sequence agcuuCCUCCAZZZNGCCGCGACAUACUUCGCGAAGGCCUGAACUUGAAgggcc The four frameshifting values for eachmotif correspond to a construct with a motif and the IS3 PK (open circles)a construct without motif and with the IS3 PK (open lozenges) a con-struct with motif and without stimulator (black inverted triangles) and aconstruct without motif and stimulator (open triangles) Each frameshift-ing value is the mean of five independent determinations (the plusmn standarddeviation intervals were omitted because they are not bigger than the sizeof the symbols in most cases) The no-motif constructs were derived bychanging each motif to either G YYN or C RRN

above background probably not as a result of frameshift-ing but because these sequences together with the followingG could act as SD sequences and direct low level initiationon the -1 frame AUA codon present 7 nt downstream (seelegend of Figure 3) The other oddity C AAG was nearly10 times above background (093) but only when the PKstimulator was present this was likely due to -1 frameshift-ing caused by the high shiftiness of the lysyl-tRNAUUU(4149) combined to the high efficiency of the PK stimula-tor

Frameshift propensity of the X XXZ ZZN motifs

The results obtained for the X XXZ ZZN heptamersand their non-shifty derivatives are presented in Figure4 The average background values given by the no-motifconstructs was 0037 plusmn 0010 for those with the IS911stimulators (open lozenges) and 0055 plusmn 0031 for thosewithout stimulator (open triangles) The estimated back-ground value for the IS3 constructs was of 0069 plusmn 0007(data not shown) The frameshifting frequency amongthe motif-containing constructs varied over a large rangeie from 0018 for G GGG GGU without stimulatorto 54 for C CCA AAG associated with the IS3 PKAs previously demonstrated in E coli the most efficient

motifs in the presence of the IS911 or IS3 stimulatorswere C CCA AAG G GGA AAG and A AAA AAG(194041) The ratio between the motif and no-motifframeshifting frequency values was used as a classifier ofmotif efficiency motifs displaying a ratio above 2 were cat-egorized as frameshift-prone Among the constructs with-out stimulator 39 motifs (61) met this criterion (ratiofrom 21 to 10) When the IS911 stimulators were added55 motifs (86) showed a ratio ranging from 21 to 112Swapping the moderate IS911 stimulators for the more effi-cient IS3 PK increased further the motif to no-motif ratio(from 21 to 1188) and raised the number of positive mo-tifs to 61 (953) the 3 motifs below the threshold wereA AAC CCA G GGC CCA and C CCG GGC

In spite of divergences as to its timing in the elongationcycle the general view concerning -1 frameshifting on slip-pery heptamers is that it occurs after proper decoding ofthe ZZN 0-frame codon when the P and A ribosomal sitesare occupied by the XXZ and ZZN codons and their cog-nate tRNAs (22230405051) Simple rules emerge whenthe effect of the ZZN codon is considered (Figure 4) TheUUN and AAN codons are on the average more frameshift-prone than CCN and GGN Whatever Z is the two ho-mogeneous ZZN codons (meaning all purine or all pyrim-idine bases) are better shifters than the two correspondingheterogeneous ones To explain further all the variations inframeshifting frequency it is also necessary to take into ac-count the nature of the X nucleotide Motifs are by andlarge more frameshift-prone when the X and Z nucleotidesare homogeneous ie all purines or all pyrimidines No-table exceptions to the latter rule are Y YYA AAR andR RRU UUY (with Y = [U C] R = [A G]) Here thehigh shiftiness of the AAR and UUY codons probablycounteracts the negative effect of the YYA and RRU het-erogenous codons

Distribution of frameshift motifs in IS elements

From the above experimental study we concluded that amajority of heptamers (and nearly half of the tetramers)were capable of eliciting -1 frameshifting at substantiallevels (at least twice the background level) To determinethe range of motifs used in genes utilizing frameshift-ing for their expression we carried out an analysis ofIS mobile genetics elements known or suspected to usethis mode of translational control We focused on themembers of the IS1 and IS3 families available in theISFinder database in October 2012 (9) As shown inFigure 5 both tetramers and heptamers are found butwith a marked preference for heptamers (87 against 403)Among the five tetramers the three most shift-prone mo-tifs A AAG and U UU[UC] predominate and the lessefficient A AAA motif is also well represented Only 16different heptamers are found with 72 of them being ei-ther A AAA AAG or A AAA AAA The next most fre-quent are A AAA AAC and G GGA AAC both of lowefficiency followed by the more efficient G GGA AAGU UUU UUC and G GGA AAA To conclude it ap-pears that genes known or suspected to use PRF-1 to ex-press a biologically important protein do not necessarily uti-lize high efficiency motifs However this conclusion is based

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 3: 1 ribosomal frameshifting in Escherichia

7212 Nucleic Acids Research 2014 Vol 42 No 11

AUUAUCCUGgcc

agcuUUGAAAUGGAGAAUG-

stop 0

-1frameshiftmotif

SD

0

-AAAUACUXXXZZZNGCUACC

G

UU-AC-GU-AC-GG-CC-GG-C

A-UG-CC

AAU A

A U

G-CC-GUGC-GU-AU-A

A

G-C

A-U

C-G

U-A

G-C

U-A

UC

CA

C

Stem Loop

IS911stimulators

C

frameshift

motif

GCGA

AG C

AC U

U AUUG

AA

AAA

CA

U U

-AAAUACUXXXZZZNGCUAC

U

IS3stimulator

D

agcuuUGAAAUCCUCAAUG-0

-1 GAccggg

G-CC-GC-GU-AG-CA-U

C-GU-AU-AC-GA-UU-AA-UC-GA-UG-C

Pseudoknot

B

agcuuUGAAAUCCUCAAUG-

-AAAUACUXXXZZZNGCUACC-frameshift

motif

-GCGCUCUUGUAAUAGAgggcc

0

-1

no-stimulator

A

lacZ

pOFX310

PTac

ori[pBR322]

bla

-AUGAAAAAACGUAAUUUaagcuugggccc-HindIII ApaI

0 Z0

Figure 2 Reporter plasmid and sequence of the three contexts in which the frameshifting propensity of the X XXZ ZZN heptamers was assessed PlasmidpOFX310 (panel A) was used to clone between a HindIII and an ApaI site the three frameshift windows shown in panels BndashD The no-stimulator construct(panel B) was derived from the IS911 construct (panel C) (48) by deletion of most of the stem-loop and mutation to CCUC of the SD-like GGAG sequenceThe IS3 construct (panel D) was engineered by replacing the IS911 stem-loop with the PK from IS3 (4) and by mutating to CCUC the stimulatory SD

srlCTn10 recA1] was used for all experiments Bacterialcultures were carried out in Luria-Bertani (LB) medium(43) to which Ampicillin (40 mgl) plus oxacillin (200 mgl)were added when necessary

Plasmid constructions for assessing -1 frameshifting

All frameshift cassettes were cloned into the pOFX310reporter (Figure 2A) derived from the pAN127 plas-mid (40) by changing the translation initiation regionof the lacZ gene between the XbaI and HindIII totctagCTCGAGATTTATTGGAATAACATATG AAAAAA CGT AAT TTa agc tt (the XbaI and HindIII sitesare in lowercase the SD sequence GGA and the ATGstart codon in frame 0 are both underlined) Overlap-ping oligonucleotides were inserted between the HindIIIand ApaI sites of the vector to reconstitute the vari-ous frameshift regions in front of the lacZ gene so thatexpression of -galactosidase requires a -1 ribosomalframeshifting event within the cloned cassette For eachtype of frameshift region (ie no stimulator IS911 stim-ulators or IS3 stimulators see Figure 2BndashD) an in-frame

construct was made to serve as 100 reference for calcu-lation of frameshifting frequencies from -galactosidaseactivities A non-shifty derivative was constructed for eachmotif to assess the background level of frameshifting Therationale was to keep the same tRNA in the A site (Z ZZNmotifs) or in both A and P sites (X XXZ ZZN motifs)For the Z ZZN tetramers only the first nucleotide wasmutated to G YYN or C RRN (with Y = [U C] and R =[A G]) For the heptamers the first and fourth nucleotidesof the motifs were changed to give G YYC UUNC RRC UUN G YYU CCN C RRU CCNG YYG AAN C RRG AAN G YYA GGN orC RRA GGN

Measurement of frameshifting frequency by -galactosidaseassay

Transcription of lacZ relies on a strong isopropyl--D-thiogalactopyranoside-inducible pTac promoter Its ex-pression was monitored by a standard colorimetric as-say (43) on cultures prepared either in the absence of in-ducer for constructs with a sufficiently high level of -

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7213

galactosidase activity (ie above 02 frameshifting) orafter isopropyl -D-1 thiogalactopyranoside induction forthe ones with a low activity (ie those with less than 11frameshifting) For each strain 5 tubes containing 05 mlof Luria-Bertani medium (supplemented with ampicillinand oxacillin) were inoculated with independent clonesand incubated overnight at 37C These cultures were ei-ther diluted 1250 in the same medium and incubated270 min at 37C (no-induction conditions) or diluted150 in the same medium plus 2 mM of isopropyl--D-thiogalactopyranoside and incubated 240 min at 37C (in-duction conditions) The dosage conditions were as previ-ously described (19) Note that both methods gave identical frameshifting values in their overlap range ie between02 and 11 We also verified the accuracy of the reportedvalues above or below the overlap range by applying to alimited set of plasmid constructions a refined assay in whichnon-induced cultures were first concentrated and lysed bysonication (data not shown)

Generation and randomization of a non-redundant lsquomostly Ecoli genomersquo (nrMEG)

The Refseq accessions of the genomes that were used toconstruct our nrMEG together with their organismstraininformation are given in Supplementary Table S1 All theprotein coding gene sequences were extracted from theffn files (National Center for Biotechnological Informationwebsite ftpftpncbinlmnihgovgenomesBacteria) ofthese 37 accessions and merged together to make a com-bined genome of 169 302 sequences These sequences wereclustered using the BLASTCLUST program using a 95sequence identity threshold at the level of nucleotide se-quence One representative sequence per cluster was ran-domly chosen to constitute an nrMEG of 22 703 sequencesEach sequence from the nrMEG was randomized 1000times using the Dicodonshuffle randomization procedure(44) to yield 1000 randomized nrMEGs The DicodonShuf-fle algorithm preserves the dinucleotide composition theencoded protein sequence and the codon usage of each gene

Analysis of XXXZZZN frequencies in protein coding se-quences

A customized perl script was used to count the occur-rences of a pattern in all three possible reading frames(ie X XXZ ZZN XX XZZ ZN and XXX ZZZ N) inboth the real and randomized nrMEGs Violin plots (45)generated with the vioplot package from the R softwarelibrary (httpwwwr-projectorg) were used to visualizethe occurrences of the 64 X XXZ ZZN patterns Z-scoreswere computed as follows z-score = (x minus xmean)xsd wherex is the frequency of occurrence of a pattern in the inte-grated genome xmean is the mean of the distribution of thesame pattern across 1000 randomized genomes and xsd isthe standard deviation of the distribution of the same pat-tern across 1000 randomized genomes Z-scores for the 64XXXZZZN in all three frames are shown in Supplemen-tary Table S3 while all violin plots are available online athttp laptiuccieheptameric patterns clusters

Clustering of genes containing selected X XXZ ZZN pat-terns

All the annotated protein coding genes from 36 of the 37genomes listed in Supplementary Table S1 (AC 000091 waslater excluded because of its removal from Refseq) werescreened for the presence of a motif from a set of selected21 X XXZ ZZN patterns (see Results section) These se-quences were clustered based on similarity between the en-coded protein sequences using the BLASTCLUST program(sequence identity threshold = 45) A total of 658 clusterswhich had at least 20 sequences and where the heptamericpattern was perfectly conserved were taken up for furtheranalysis The coordinates of the conserved heptameric pat-terns were also recorded for each cluster

However these clusters contain only sequences of pro-tein coding genes from those 36 genomes which were ini-tially selected to constitute the nrMEG These clusters wereenriched with additional homologous sequences from thegenomes not included in the nrMEG in an attempt to ob-tain a better phylogenetic signal The enrichment was car-ried out using a tblastn search against all bacterial se-quences in the nr database as described previously (11)The lsquonewlyrsquo obtained homologous sequences for each clus-ter were aligned by translating them into protein sequencesaligning these protein sequences and then back translatingthe aligned protein sequences to their corresponding nu-cleotide sequences The coordinates of the conserved hep-tameric pattern for each cluster (which were recorded in theprevious step) were recalculated to account for gaps intro-duced during alignment of additional sequences

Identification of clusters with conserved heptameric pattern

For the 658 clusters identified in the previous step we em-ployed an additional filtering procedure to identify thoseclusters where the heptameric pattern is conserved For eachcluster the total number of sequences was referred to asNall The number of sequences where the heptameric pat-tern was the same as the parent pattern was referred to asN1 The number of sequences where the pattern is not thesame but is one of the 64 X XXY YYZ patterns was re-ferred to as N2 Finally it has been previously observed(eg in dnaX) that the position of the frameshift site maynot be perfectly conserved To account for that possibilitya 36-nt window starting from the coordinate which is 15nt upstream of the conserved heptameric coordinate wasalso screened for the presence of any X XXY YYZ pat-tern the number of sequences in that category was referredto as N3 For each cluster the three values were summed up(ie N1+N2+N3 = Nsum) and the ratio NsumNall was calcu-lated Clusters where NsumNall gt 09 were labelled as lsquocon-servedrsquo In an ideal situation the frameshift pattern shouldbe absolutely conserved but this threshold was relaxed soas to allow for the possibility of sequencing errors or recentmutations in the sequences from a cluster The end resultwas a set of 69 clusters (features of these clusters are pre-sented in Supplementary Tables S4ndashS8 and the completesequence of their genes and other features are available athttplaptiuccieheptameric patterns clusters)

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7214 Nucleic Acids Research 2014 Vol 42 No 11

Synonymous-site conservation analyses

The degree of conservation at synonymous sites was cal-culated as previously described (46) for a 15-codon win-dow The detailed results of this analysis are available on-line at http laptiuccieheptameric patterns clusters anda summary is included in the RSSV column (for reducedvariability at synonymous sites) of Supplementary TablesS4 and S5 However a statistically significant conservationat synonymous sites can only be observed if there is suffi-cient sequence divergence in the alignment To numericallyquantify sequence divergence we also calculated a statisticcalled alndiv which corresponds to an estimate of the meannumber of phylogenetically independent nucleotide substi-tutions per alignment column (see Supplementary Tables S4and S5)

Search of potential stimulatory elements flanking the pattern

Two types of potential frameshift stimulators weresearched The first type was an SD-like sequence (eitherGAGG GGAG AGGA GGNGG or AGGKG with K= [TG]) located 6ndash17 nt before the second base of themotif these sequences and the spacing interval were chosenbecause all were experimentally proved to be stimulatory(32) A segment of 30 nt ending with the first base of themotif was scanned using a script for Perl (version 5101from ActiveState) in all the sequences of each cluster forthe above potential SD sequences A cluster qualified ashaving a lsquoconserved SDrsquo if at least 50 of its sequences hadan SD

The second type of stimulator was an RNA structuredownstream of the motif A preliminary study of 271 IS3family members (see Supplementary Figure S2) led us tochoose the following empirical rules the structure is (i) asimple or branched hairpin of a length ranging from 17to 140 nt (ii) that starts 4ndash10 nt after the last base of themotif (iii) with a G-C(or C-G) base-pair followed by at leastthree consecutive WatsonndashCrick or GndashU or UndashG base-pairsand (iv) has a Gunfold37C ge 76 kcalmolminus1 the G37Cvalue was determined using the default parameters of theRNAfold program from version 185 of the Vienna RNApackage (47) We limited our search to hairpin structuresand and did not explore whether some of the structurescould also form PKs at this preliminary stage For each se-quence of all clusters a 197 nt segment starting at the fourthbase after the motif was extracted and analysed with a cus-tom Perl script Each segment was first deleted from the 3primeend one base at a time and down to 17 nt Each set of nesteddeletions was passed to the RNAfold program and the po-tential structures were sorted out to retain those conformingto the rules For a given cluster structures were grouped intypes according to hairpin size and distance from the mo-tif The frequency of each type of structure was calculatedand only those present in at least 50 of the sequences ofthe cluster were retained 53 of the 69 clusters had sucha conserved structure The Ghpntminus1 parameter (ie theGunfold value divided by the number of nucleotides in thehairpin structure) was calculated For clusters with severaltypes of structure the lsquobestrsquo type was the one with the high-est value for the [(Ghpntminus1) x (frequency)] product A

summary of these analyses is reported in SupplementaryTables S4 S5 and S8

For further comparison 20 ISs from the IS3 family wereselected because they contain a frameshift motif followedby a known (or likely) stimulatory structure of a size rang-ing from 17 to 131 nt The Ghp and Ghpntminus1 param-eters were calculated for each structure Selective pressureto maintain a hairpin should likely result in a region morestructured than neighbouring regions of the same size iehaving a value higher than average for the Gntminus1 param-eter To assess that a 197 nt segment starting 4 nt down-stream of the frameshift motif was extracted for each ofthe 20 ISs as well as for one typical sequence of each ofthe 53 clusters with a conserved structure For each 197 ntsegment a Perl script generated a subset of sequences bymoving (1 nt at a time) a sliding window of the size of thecorresponding conserved hairpin and passed it to RNAfoldThe average Gunfoldnt-1 (Gavntminus1) were calculated foreach subset as well as the Gntminus1 which is the difference[(Ghpntminus1) minus (Gavntminus1)] As expected the 20 ISs pos-sessing a stimulatory structure all display a higher than av-erage Gntminus1 (Supplementary Table S8)

RESULTS

In vivo determination of -1 frameshifting frequency

As illustrated in Figure 2 the 16 Z ZZN motifs and the64 X XXZ ZZN motifs were cloned either without flank-ing stimulators or with a strong downstream stimulator de-rived from the IS3 PK (1819) or for the heptamers onlywith the moderately efficient combination of upstream anddownstream stimulators from IS911 (3248) For both typesof motifs a non-shifty derivative was similarly cloned forthat the first base of each tetramer or the first and fourthbases of each heptamer were mutated The shifty and non-shifty cassettes were inserted in front of the lacZ gene car-ried by plasmid pOFX310 so that translation of full-length-galactosidase occurs only when ribosomes move to the -1frame before encountering the 0 frame stop codon (Figure2)

Frameshift propensity of the Z ZZN motifs

The graphs showing the variation of -1 frameshifting fre-quency as a function of the sequence of the motif arepresented in Figure 3 for the Z ZZN tetramers (see alsoSupplementary Table S2) The motif-containing constructswithout PK (save G GGG see below) were on the aver-age marginally above their no-motif counterpart (0147 plusmn0004 versus 0112 plusmn 0015) which suggests that the mo-tifs are by themselves barely or not at all shifty Addition ofthe IS3 PK led to substantial increase in frameshifting fre-quency for 10 motifs Only six were at least four times abovebackground (ie above 0064 plusmn 0014) with frequenciesranging from 026 to 56 For them the hierarchy was[A AAG gtgt U UUC gt U UUU gt C CCU = C CCCgtA AAA] Four motifs (U UUA U UUG C CCA andC CCG) were 18- to 36-fold above background

A few oddities were revealed The G GGG C GGAand C GGG constructs (with and without PK) were found

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7215

motif amp PKIS3

no-motif amp PKIS3

motif amp no-stimulator

no-motif amp no-stimulator

U_UUU C A G

C_CC

U C A G

G_GGU C A G

A_AAU C A G

001

01

1

10

100

-1

fra

me

sh

iftin

g

Figure 3 Frameshift efficiency of the Z ZZN tetramers The IS3frameshift region cloned in plasmid pOFX310 was the one used in a previ-ous study [see Figure 1 in (19)] It differs slightly from the one used for theheptamer analysis (Figure 2 panel D) The nucleotides upstream (6 nt) anddownstream (5 nt) of the motif are those found in IS3 The sequence fromthe HindIII site to the start of the PK is agcuuCCUCCAZZZNGCCGCndashndash The no-stimulator construct was derived by deleting the 3prime half of thePK right after the UGA stop codon in the 0 frame to give the follow-ing sequence agcuuCCUCCAZZZNGCCGCGACAUACUUCGCGAAGGCCUGAACUUGAAgggcc The four frameshifting values for eachmotif correspond to a construct with a motif and the IS3 PK (open circles)a construct without motif and with the IS3 PK (open lozenges) a con-struct with motif and without stimulator (black inverted triangles) and aconstruct without motif and stimulator (open triangles) Each frameshift-ing value is the mean of five independent determinations (the plusmn standarddeviation intervals were omitted because they are not bigger than the sizeof the symbols in most cases) The no-motif constructs were derived bychanging each motif to either G YYN or C RRN

above background probably not as a result of frameshift-ing but because these sequences together with the followingG could act as SD sequences and direct low level initiationon the -1 frame AUA codon present 7 nt downstream (seelegend of Figure 3) The other oddity C AAG was nearly10 times above background (093) but only when the PKstimulator was present this was likely due to -1 frameshift-ing caused by the high shiftiness of the lysyl-tRNAUUU(4149) combined to the high efficiency of the PK stimula-tor

Frameshift propensity of the X XXZ ZZN motifs

The results obtained for the X XXZ ZZN heptamersand their non-shifty derivatives are presented in Figure4 The average background values given by the no-motifconstructs was 0037 plusmn 0010 for those with the IS911stimulators (open lozenges) and 0055 plusmn 0031 for thosewithout stimulator (open triangles) The estimated back-ground value for the IS3 constructs was of 0069 plusmn 0007(data not shown) The frameshifting frequency amongthe motif-containing constructs varied over a large rangeie from 0018 for G GGG GGU without stimulatorto 54 for C CCA AAG associated with the IS3 PKAs previously demonstrated in E coli the most efficient

motifs in the presence of the IS911 or IS3 stimulatorswere C CCA AAG G GGA AAG and A AAA AAG(194041) The ratio between the motif and no-motifframeshifting frequency values was used as a classifier ofmotif efficiency motifs displaying a ratio above 2 were cat-egorized as frameshift-prone Among the constructs with-out stimulator 39 motifs (61) met this criterion (ratiofrom 21 to 10) When the IS911 stimulators were added55 motifs (86) showed a ratio ranging from 21 to 112Swapping the moderate IS911 stimulators for the more effi-cient IS3 PK increased further the motif to no-motif ratio(from 21 to 1188) and raised the number of positive mo-tifs to 61 (953) the 3 motifs below the threshold wereA AAC CCA G GGC CCA and C CCG GGC

In spite of divergences as to its timing in the elongationcycle the general view concerning -1 frameshifting on slip-pery heptamers is that it occurs after proper decoding ofthe ZZN 0-frame codon when the P and A ribosomal sitesare occupied by the XXZ and ZZN codons and their cog-nate tRNAs (22230405051) Simple rules emerge whenthe effect of the ZZN codon is considered (Figure 4) TheUUN and AAN codons are on the average more frameshift-prone than CCN and GGN Whatever Z is the two ho-mogeneous ZZN codons (meaning all purine or all pyrim-idine bases) are better shifters than the two correspondingheterogeneous ones To explain further all the variations inframeshifting frequency it is also necessary to take into ac-count the nature of the X nucleotide Motifs are by andlarge more frameshift-prone when the X and Z nucleotidesare homogeneous ie all purines or all pyrimidines No-table exceptions to the latter rule are Y YYA AAR andR RRU UUY (with Y = [U C] R = [A G]) Here thehigh shiftiness of the AAR and UUY codons probablycounteracts the negative effect of the YYA and RRU het-erogenous codons

Distribution of frameshift motifs in IS elements

From the above experimental study we concluded that amajority of heptamers (and nearly half of the tetramers)were capable of eliciting -1 frameshifting at substantiallevels (at least twice the background level) To determinethe range of motifs used in genes utilizing frameshift-ing for their expression we carried out an analysis ofIS mobile genetics elements known or suspected to usethis mode of translational control We focused on themembers of the IS1 and IS3 families available in theISFinder database in October 2012 (9) As shown inFigure 5 both tetramers and heptamers are found butwith a marked preference for heptamers (87 against 403)Among the five tetramers the three most shift-prone mo-tifs A AAG and U UU[UC] predominate and the lessefficient A AAA motif is also well represented Only 16different heptamers are found with 72 of them being ei-ther A AAA AAG or A AAA AAA The next most fre-quent are A AAA AAC and G GGA AAC both of lowefficiency followed by the more efficient G GGA AAGU UUU UUC and G GGA AAA To conclude it ap-pears that genes known or suspected to use PRF-1 to ex-press a biologically important protein do not necessarily uti-lize high efficiency motifs However this conclusion is based

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 4: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7213

galactosidase activity (ie above 02 frameshifting) orafter isopropyl -D-1 thiogalactopyranoside induction forthe ones with a low activity (ie those with less than 11frameshifting) For each strain 5 tubes containing 05 mlof Luria-Bertani medium (supplemented with ampicillinand oxacillin) were inoculated with independent clonesand incubated overnight at 37C These cultures were ei-ther diluted 1250 in the same medium and incubated270 min at 37C (no-induction conditions) or diluted150 in the same medium plus 2 mM of isopropyl--D-thiogalactopyranoside and incubated 240 min at 37C (in-duction conditions) The dosage conditions were as previ-ously described (19) Note that both methods gave identical frameshifting values in their overlap range ie between02 and 11 We also verified the accuracy of the reportedvalues above or below the overlap range by applying to alimited set of plasmid constructions a refined assay in whichnon-induced cultures were first concentrated and lysed bysonication (data not shown)

Generation and randomization of a non-redundant lsquomostly Ecoli genomersquo (nrMEG)

The Refseq accessions of the genomes that were used toconstruct our nrMEG together with their organismstraininformation are given in Supplementary Table S1 All theprotein coding gene sequences were extracted from theffn files (National Center for Biotechnological Informationwebsite ftpftpncbinlmnihgovgenomesBacteria) ofthese 37 accessions and merged together to make a com-bined genome of 169 302 sequences These sequences wereclustered using the BLASTCLUST program using a 95sequence identity threshold at the level of nucleotide se-quence One representative sequence per cluster was ran-domly chosen to constitute an nrMEG of 22 703 sequencesEach sequence from the nrMEG was randomized 1000times using the Dicodonshuffle randomization procedure(44) to yield 1000 randomized nrMEGs The DicodonShuf-fle algorithm preserves the dinucleotide composition theencoded protein sequence and the codon usage of each gene

Analysis of XXXZZZN frequencies in protein coding se-quences

A customized perl script was used to count the occur-rences of a pattern in all three possible reading frames(ie X XXZ ZZN XX XZZ ZN and XXX ZZZ N) inboth the real and randomized nrMEGs Violin plots (45)generated with the vioplot package from the R softwarelibrary (httpwwwr-projectorg) were used to visualizethe occurrences of the 64 X XXZ ZZN patterns Z-scoreswere computed as follows z-score = (x minus xmean)xsd wherex is the frequency of occurrence of a pattern in the inte-grated genome xmean is the mean of the distribution of thesame pattern across 1000 randomized genomes and xsd isthe standard deviation of the distribution of the same pat-tern across 1000 randomized genomes Z-scores for the 64XXXZZZN in all three frames are shown in Supplemen-tary Table S3 while all violin plots are available online athttp laptiuccieheptameric patterns clusters

Clustering of genes containing selected X XXZ ZZN pat-terns

All the annotated protein coding genes from 36 of the 37genomes listed in Supplementary Table S1 (AC 000091 waslater excluded because of its removal from Refseq) werescreened for the presence of a motif from a set of selected21 X XXZ ZZN patterns (see Results section) These se-quences were clustered based on similarity between the en-coded protein sequences using the BLASTCLUST program(sequence identity threshold = 45) A total of 658 clusterswhich had at least 20 sequences and where the heptamericpattern was perfectly conserved were taken up for furtheranalysis The coordinates of the conserved heptameric pat-terns were also recorded for each cluster

However these clusters contain only sequences of pro-tein coding genes from those 36 genomes which were ini-tially selected to constitute the nrMEG These clusters wereenriched with additional homologous sequences from thegenomes not included in the nrMEG in an attempt to ob-tain a better phylogenetic signal The enrichment was car-ried out using a tblastn search against all bacterial se-quences in the nr database as described previously (11)The lsquonewlyrsquo obtained homologous sequences for each clus-ter were aligned by translating them into protein sequencesaligning these protein sequences and then back translatingthe aligned protein sequences to their corresponding nu-cleotide sequences The coordinates of the conserved hep-tameric pattern for each cluster (which were recorded in theprevious step) were recalculated to account for gaps intro-duced during alignment of additional sequences

Identification of clusters with conserved heptameric pattern

For the 658 clusters identified in the previous step we em-ployed an additional filtering procedure to identify thoseclusters where the heptameric pattern is conserved For eachcluster the total number of sequences was referred to asNall The number of sequences where the heptameric pat-tern was the same as the parent pattern was referred to asN1 The number of sequences where the pattern is not thesame but is one of the 64 X XXY YYZ patterns was re-ferred to as N2 Finally it has been previously observed(eg in dnaX) that the position of the frameshift site maynot be perfectly conserved To account for that possibilitya 36-nt window starting from the coordinate which is 15nt upstream of the conserved heptameric coordinate wasalso screened for the presence of any X XXY YYZ pat-tern the number of sequences in that category was referredto as N3 For each cluster the three values were summed up(ie N1+N2+N3 = Nsum) and the ratio NsumNall was calcu-lated Clusters where NsumNall gt 09 were labelled as lsquocon-servedrsquo In an ideal situation the frameshift pattern shouldbe absolutely conserved but this threshold was relaxed soas to allow for the possibility of sequencing errors or recentmutations in the sequences from a cluster The end resultwas a set of 69 clusters (features of these clusters are pre-sented in Supplementary Tables S4ndashS8 and the completesequence of their genes and other features are available athttplaptiuccieheptameric patterns clusters)

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7214 Nucleic Acids Research 2014 Vol 42 No 11

Synonymous-site conservation analyses

The degree of conservation at synonymous sites was cal-culated as previously described (46) for a 15-codon win-dow The detailed results of this analysis are available on-line at http laptiuccieheptameric patterns clusters anda summary is included in the RSSV column (for reducedvariability at synonymous sites) of Supplementary TablesS4 and S5 However a statistically significant conservationat synonymous sites can only be observed if there is suffi-cient sequence divergence in the alignment To numericallyquantify sequence divergence we also calculated a statisticcalled alndiv which corresponds to an estimate of the meannumber of phylogenetically independent nucleotide substi-tutions per alignment column (see Supplementary Tables S4and S5)

Search of potential stimulatory elements flanking the pattern

Two types of potential frameshift stimulators weresearched The first type was an SD-like sequence (eitherGAGG GGAG AGGA GGNGG or AGGKG with K= [TG]) located 6ndash17 nt before the second base of themotif these sequences and the spacing interval were chosenbecause all were experimentally proved to be stimulatory(32) A segment of 30 nt ending with the first base of themotif was scanned using a script for Perl (version 5101from ActiveState) in all the sequences of each cluster forthe above potential SD sequences A cluster qualified ashaving a lsquoconserved SDrsquo if at least 50 of its sequences hadan SD

The second type of stimulator was an RNA structuredownstream of the motif A preliminary study of 271 IS3family members (see Supplementary Figure S2) led us tochoose the following empirical rules the structure is (i) asimple or branched hairpin of a length ranging from 17to 140 nt (ii) that starts 4ndash10 nt after the last base of themotif (iii) with a G-C(or C-G) base-pair followed by at leastthree consecutive WatsonndashCrick or GndashU or UndashG base-pairsand (iv) has a Gunfold37C ge 76 kcalmolminus1 the G37Cvalue was determined using the default parameters of theRNAfold program from version 185 of the Vienna RNApackage (47) We limited our search to hairpin structuresand and did not explore whether some of the structurescould also form PKs at this preliminary stage For each se-quence of all clusters a 197 nt segment starting at the fourthbase after the motif was extracted and analysed with a cus-tom Perl script Each segment was first deleted from the 3primeend one base at a time and down to 17 nt Each set of nesteddeletions was passed to the RNAfold program and the po-tential structures were sorted out to retain those conformingto the rules For a given cluster structures were grouped intypes according to hairpin size and distance from the mo-tif The frequency of each type of structure was calculatedand only those present in at least 50 of the sequences ofthe cluster were retained 53 of the 69 clusters had sucha conserved structure The Ghpntminus1 parameter (ie theGunfold value divided by the number of nucleotides in thehairpin structure) was calculated For clusters with severaltypes of structure the lsquobestrsquo type was the one with the high-est value for the [(Ghpntminus1) x (frequency)] product A

summary of these analyses is reported in SupplementaryTables S4 S5 and S8

For further comparison 20 ISs from the IS3 family wereselected because they contain a frameshift motif followedby a known (or likely) stimulatory structure of a size rang-ing from 17 to 131 nt The Ghp and Ghpntminus1 param-eters were calculated for each structure Selective pressureto maintain a hairpin should likely result in a region morestructured than neighbouring regions of the same size iehaving a value higher than average for the Gntminus1 param-eter To assess that a 197 nt segment starting 4 nt down-stream of the frameshift motif was extracted for each ofthe 20 ISs as well as for one typical sequence of each ofthe 53 clusters with a conserved structure For each 197 ntsegment a Perl script generated a subset of sequences bymoving (1 nt at a time) a sliding window of the size of thecorresponding conserved hairpin and passed it to RNAfoldThe average Gunfoldnt-1 (Gavntminus1) were calculated foreach subset as well as the Gntminus1 which is the difference[(Ghpntminus1) minus (Gavntminus1)] As expected the 20 ISs pos-sessing a stimulatory structure all display a higher than av-erage Gntminus1 (Supplementary Table S8)

RESULTS

In vivo determination of -1 frameshifting frequency

As illustrated in Figure 2 the 16 Z ZZN motifs and the64 X XXZ ZZN motifs were cloned either without flank-ing stimulators or with a strong downstream stimulator de-rived from the IS3 PK (1819) or for the heptamers onlywith the moderately efficient combination of upstream anddownstream stimulators from IS911 (3248) For both typesof motifs a non-shifty derivative was similarly cloned forthat the first base of each tetramer or the first and fourthbases of each heptamer were mutated The shifty and non-shifty cassettes were inserted in front of the lacZ gene car-ried by plasmid pOFX310 so that translation of full-length-galactosidase occurs only when ribosomes move to the -1frame before encountering the 0 frame stop codon (Figure2)

Frameshift propensity of the Z ZZN motifs

The graphs showing the variation of -1 frameshifting fre-quency as a function of the sequence of the motif arepresented in Figure 3 for the Z ZZN tetramers (see alsoSupplementary Table S2) The motif-containing constructswithout PK (save G GGG see below) were on the aver-age marginally above their no-motif counterpart (0147 plusmn0004 versus 0112 plusmn 0015) which suggests that the mo-tifs are by themselves barely or not at all shifty Addition ofthe IS3 PK led to substantial increase in frameshifting fre-quency for 10 motifs Only six were at least four times abovebackground (ie above 0064 plusmn 0014) with frequenciesranging from 026 to 56 For them the hierarchy was[A AAG gtgt U UUC gt U UUU gt C CCU = C CCCgtA AAA] Four motifs (U UUA U UUG C CCA andC CCG) were 18- to 36-fold above background

A few oddities were revealed The G GGG C GGAand C GGG constructs (with and without PK) were found

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7215

motif amp PKIS3

no-motif amp PKIS3

motif amp no-stimulator

no-motif amp no-stimulator

U_UUU C A G

C_CC

U C A G

G_GGU C A G

A_AAU C A G

001

01

1

10

100

-1

fra

me

sh

iftin

g

Figure 3 Frameshift efficiency of the Z ZZN tetramers The IS3frameshift region cloned in plasmid pOFX310 was the one used in a previ-ous study [see Figure 1 in (19)] It differs slightly from the one used for theheptamer analysis (Figure 2 panel D) The nucleotides upstream (6 nt) anddownstream (5 nt) of the motif are those found in IS3 The sequence fromthe HindIII site to the start of the PK is agcuuCCUCCAZZZNGCCGCndashndash The no-stimulator construct was derived by deleting the 3prime half of thePK right after the UGA stop codon in the 0 frame to give the follow-ing sequence agcuuCCUCCAZZZNGCCGCGACAUACUUCGCGAAGGCCUGAACUUGAAgggcc The four frameshifting values for eachmotif correspond to a construct with a motif and the IS3 PK (open circles)a construct without motif and with the IS3 PK (open lozenges) a con-struct with motif and without stimulator (black inverted triangles) and aconstruct without motif and stimulator (open triangles) Each frameshift-ing value is the mean of five independent determinations (the plusmn standarddeviation intervals were omitted because they are not bigger than the sizeof the symbols in most cases) The no-motif constructs were derived bychanging each motif to either G YYN or C RRN

above background probably not as a result of frameshift-ing but because these sequences together with the followingG could act as SD sequences and direct low level initiationon the -1 frame AUA codon present 7 nt downstream (seelegend of Figure 3) The other oddity C AAG was nearly10 times above background (093) but only when the PKstimulator was present this was likely due to -1 frameshift-ing caused by the high shiftiness of the lysyl-tRNAUUU(4149) combined to the high efficiency of the PK stimula-tor

Frameshift propensity of the X XXZ ZZN motifs

The results obtained for the X XXZ ZZN heptamersand their non-shifty derivatives are presented in Figure4 The average background values given by the no-motifconstructs was 0037 plusmn 0010 for those with the IS911stimulators (open lozenges) and 0055 plusmn 0031 for thosewithout stimulator (open triangles) The estimated back-ground value for the IS3 constructs was of 0069 plusmn 0007(data not shown) The frameshifting frequency amongthe motif-containing constructs varied over a large rangeie from 0018 for G GGG GGU without stimulatorto 54 for C CCA AAG associated with the IS3 PKAs previously demonstrated in E coli the most efficient

motifs in the presence of the IS911 or IS3 stimulatorswere C CCA AAG G GGA AAG and A AAA AAG(194041) The ratio between the motif and no-motifframeshifting frequency values was used as a classifier ofmotif efficiency motifs displaying a ratio above 2 were cat-egorized as frameshift-prone Among the constructs with-out stimulator 39 motifs (61) met this criterion (ratiofrom 21 to 10) When the IS911 stimulators were added55 motifs (86) showed a ratio ranging from 21 to 112Swapping the moderate IS911 stimulators for the more effi-cient IS3 PK increased further the motif to no-motif ratio(from 21 to 1188) and raised the number of positive mo-tifs to 61 (953) the 3 motifs below the threshold wereA AAC CCA G GGC CCA and C CCG GGC

In spite of divergences as to its timing in the elongationcycle the general view concerning -1 frameshifting on slip-pery heptamers is that it occurs after proper decoding ofthe ZZN 0-frame codon when the P and A ribosomal sitesare occupied by the XXZ and ZZN codons and their cog-nate tRNAs (22230405051) Simple rules emerge whenthe effect of the ZZN codon is considered (Figure 4) TheUUN and AAN codons are on the average more frameshift-prone than CCN and GGN Whatever Z is the two ho-mogeneous ZZN codons (meaning all purine or all pyrim-idine bases) are better shifters than the two correspondingheterogeneous ones To explain further all the variations inframeshifting frequency it is also necessary to take into ac-count the nature of the X nucleotide Motifs are by andlarge more frameshift-prone when the X and Z nucleotidesare homogeneous ie all purines or all pyrimidines No-table exceptions to the latter rule are Y YYA AAR andR RRU UUY (with Y = [U C] R = [A G]) Here thehigh shiftiness of the AAR and UUY codons probablycounteracts the negative effect of the YYA and RRU het-erogenous codons

Distribution of frameshift motifs in IS elements

From the above experimental study we concluded that amajority of heptamers (and nearly half of the tetramers)were capable of eliciting -1 frameshifting at substantiallevels (at least twice the background level) To determinethe range of motifs used in genes utilizing frameshift-ing for their expression we carried out an analysis ofIS mobile genetics elements known or suspected to usethis mode of translational control We focused on themembers of the IS1 and IS3 families available in theISFinder database in October 2012 (9) As shown inFigure 5 both tetramers and heptamers are found butwith a marked preference for heptamers (87 against 403)Among the five tetramers the three most shift-prone mo-tifs A AAG and U UU[UC] predominate and the lessefficient A AAA motif is also well represented Only 16different heptamers are found with 72 of them being ei-ther A AAA AAG or A AAA AAA The next most fre-quent are A AAA AAC and G GGA AAC both of lowefficiency followed by the more efficient G GGA AAGU UUU UUC and G GGA AAA To conclude it ap-pears that genes known or suspected to use PRF-1 to ex-press a biologically important protein do not necessarily uti-lize high efficiency motifs However this conclusion is based

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 5: 1 ribosomal frameshifting in Escherichia

7214 Nucleic Acids Research 2014 Vol 42 No 11

Synonymous-site conservation analyses

The degree of conservation at synonymous sites was cal-culated as previously described (46) for a 15-codon win-dow The detailed results of this analysis are available on-line at http laptiuccieheptameric patterns clusters anda summary is included in the RSSV column (for reducedvariability at synonymous sites) of Supplementary TablesS4 and S5 However a statistically significant conservationat synonymous sites can only be observed if there is suffi-cient sequence divergence in the alignment To numericallyquantify sequence divergence we also calculated a statisticcalled alndiv which corresponds to an estimate of the meannumber of phylogenetically independent nucleotide substi-tutions per alignment column (see Supplementary Tables S4and S5)

Search of potential stimulatory elements flanking the pattern

Two types of potential frameshift stimulators weresearched The first type was an SD-like sequence (eitherGAGG GGAG AGGA GGNGG or AGGKG with K= [TG]) located 6ndash17 nt before the second base of themotif these sequences and the spacing interval were chosenbecause all were experimentally proved to be stimulatory(32) A segment of 30 nt ending with the first base of themotif was scanned using a script for Perl (version 5101from ActiveState) in all the sequences of each cluster forthe above potential SD sequences A cluster qualified ashaving a lsquoconserved SDrsquo if at least 50 of its sequences hadan SD

The second type of stimulator was an RNA structuredownstream of the motif A preliminary study of 271 IS3family members (see Supplementary Figure S2) led us tochoose the following empirical rules the structure is (i) asimple or branched hairpin of a length ranging from 17to 140 nt (ii) that starts 4ndash10 nt after the last base of themotif (iii) with a G-C(or C-G) base-pair followed by at leastthree consecutive WatsonndashCrick or GndashU or UndashG base-pairsand (iv) has a Gunfold37C ge 76 kcalmolminus1 the G37Cvalue was determined using the default parameters of theRNAfold program from version 185 of the Vienna RNApackage (47) We limited our search to hairpin structuresand and did not explore whether some of the structurescould also form PKs at this preliminary stage For each se-quence of all clusters a 197 nt segment starting at the fourthbase after the motif was extracted and analysed with a cus-tom Perl script Each segment was first deleted from the 3primeend one base at a time and down to 17 nt Each set of nesteddeletions was passed to the RNAfold program and the po-tential structures were sorted out to retain those conformingto the rules For a given cluster structures were grouped intypes according to hairpin size and distance from the mo-tif The frequency of each type of structure was calculatedand only those present in at least 50 of the sequences ofthe cluster were retained 53 of the 69 clusters had sucha conserved structure The Ghpntminus1 parameter (ie theGunfold value divided by the number of nucleotides in thehairpin structure) was calculated For clusters with severaltypes of structure the lsquobestrsquo type was the one with the high-est value for the [(Ghpntminus1) x (frequency)] product A

summary of these analyses is reported in SupplementaryTables S4 S5 and S8

For further comparison 20 ISs from the IS3 family wereselected because they contain a frameshift motif followedby a known (or likely) stimulatory structure of a size rang-ing from 17 to 131 nt The Ghp and Ghpntminus1 param-eters were calculated for each structure Selective pressureto maintain a hairpin should likely result in a region morestructured than neighbouring regions of the same size iehaving a value higher than average for the Gntminus1 param-eter To assess that a 197 nt segment starting 4 nt down-stream of the frameshift motif was extracted for each ofthe 20 ISs as well as for one typical sequence of each ofthe 53 clusters with a conserved structure For each 197 ntsegment a Perl script generated a subset of sequences bymoving (1 nt at a time) a sliding window of the size of thecorresponding conserved hairpin and passed it to RNAfoldThe average Gunfoldnt-1 (Gavntminus1) were calculated foreach subset as well as the Gntminus1 which is the difference[(Ghpntminus1) minus (Gavntminus1)] As expected the 20 ISs pos-sessing a stimulatory structure all display a higher than av-erage Gntminus1 (Supplementary Table S8)

RESULTS

In vivo determination of -1 frameshifting frequency

As illustrated in Figure 2 the 16 Z ZZN motifs and the64 X XXZ ZZN motifs were cloned either without flank-ing stimulators or with a strong downstream stimulator de-rived from the IS3 PK (1819) or for the heptamers onlywith the moderately efficient combination of upstream anddownstream stimulators from IS911 (3248) For both typesof motifs a non-shifty derivative was similarly cloned forthat the first base of each tetramer or the first and fourthbases of each heptamer were mutated The shifty and non-shifty cassettes were inserted in front of the lacZ gene car-ried by plasmid pOFX310 so that translation of full-length-galactosidase occurs only when ribosomes move to the -1frame before encountering the 0 frame stop codon (Figure2)

Frameshift propensity of the Z ZZN motifs

The graphs showing the variation of -1 frameshifting fre-quency as a function of the sequence of the motif arepresented in Figure 3 for the Z ZZN tetramers (see alsoSupplementary Table S2) The motif-containing constructswithout PK (save G GGG see below) were on the aver-age marginally above their no-motif counterpart (0147 plusmn0004 versus 0112 plusmn 0015) which suggests that the mo-tifs are by themselves barely or not at all shifty Addition ofthe IS3 PK led to substantial increase in frameshifting fre-quency for 10 motifs Only six were at least four times abovebackground (ie above 0064 plusmn 0014) with frequenciesranging from 026 to 56 For them the hierarchy was[A AAG gtgt U UUC gt U UUU gt C CCU = C CCCgtA AAA] Four motifs (U UUA U UUG C CCA andC CCG) were 18- to 36-fold above background

A few oddities were revealed The G GGG C GGAand C GGG constructs (with and without PK) were found

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7215

motif amp PKIS3

no-motif amp PKIS3

motif amp no-stimulator

no-motif amp no-stimulator

U_UUU C A G

C_CC

U C A G

G_GGU C A G

A_AAU C A G

001

01

1

10

100

-1

fra

me

sh

iftin

g

Figure 3 Frameshift efficiency of the Z ZZN tetramers The IS3frameshift region cloned in plasmid pOFX310 was the one used in a previ-ous study [see Figure 1 in (19)] It differs slightly from the one used for theheptamer analysis (Figure 2 panel D) The nucleotides upstream (6 nt) anddownstream (5 nt) of the motif are those found in IS3 The sequence fromthe HindIII site to the start of the PK is agcuuCCUCCAZZZNGCCGCndashndash The no-stimulator construct was derived by deleting the 3prime half of thePK right after the UGA stop codon in the 0 frame to give the follow-ing sequence agcuuCCUCCAZZZNGCCGCGACAUACUUCGCGAAGGCCUGAACUUGAAgggcc The four frameshifting values for eachmotif correspond to a construct with a motif and the IS3 PK (open circles)a construct without motif and with the IS3 PK (open lozenges) a con-struct with motif and without stimulator (black inverted triangles) and aconstruct without motif and stimulator (open triangles) Each frameshift-ing value is the mean of five independent determinations (the plusmn standarddeviation intervals were omitted because they are not bigger than the sizeof the symbols in most cases) The no-motif constructs were derived bychanging each motif to either G YYN or C RRN

above background probably not as a result of frameshift-ing but because these sequences together with the followingG could act as SD sequences and direct low level initiationon the -1 frame AUA codon present 7 nt downstream (seelegend of Figure 3) The other oddity C AAG was nearly10 times above background (093) but only when the PKstimulator was present this was likely due to -1 frameshift-ing caused by the high shiftiness of the lysyl-tRNAUUU(4149) combined to the high efficiency of the PK stimula-tor

Frameshift propensity of the X XXZ ZZN motifs

The results obtained for the X XXZ ZZN heptamersand their non-shifty derivatives are presented in Figure4 The average background values given by the no-motifconstructs was 0037 plusmn 0010 for those with the IS911stimulators (open lozenges) and 0055 plusmn 0031 for thosewithout stimulator (open triangles) The estimated back-ground value for the IS3 constructs was of 0069 plusmn 0007(data not shown) The frameshifting frequency amongthe motif-containing constructs varied over a large rangeie from 0018 for G GGG GGU without stimulatorto 54 for C CCA AAG associated with the IS3 PKAs previously demonstrated in E coli the most efficient

motifs in the presence of the IS911 or IS3 stimulatorswere C CCA AAG G GGA AAG and A AAA AAG(194041) The ratio between the motif and no-motifframeshifting frequency values was used as a classifier ofmotif efficiency motifs displaying a ratio above 2 were cat-egorized as frameshift-prone Among the constructs with-out stimulator 39 motifs (61) met this criterion (ratiofrom 21 to 10) When the IS911 stimulators were added55 motifs (86) showed a ratio ranging from 21 to 112Swapping the moderate IS911 stimulators for the more effi-cient IS3 PK increased further the motif to no-motif ratio(from 21 to 1188) and raised the number of positive mo-tifs to 61 (953) the 3 motifs below the threshold wereA AAC CCA G GGC CCA and C CCG GGC

In spite of divergences as to its timing in the elongationcycle the general view concerning -1 frameshifting on slip-pery heptamers is that it occurs after proper decoding ofthe ZZN 0-frame codon when the P and A ribosomal sitesare occupied by the XXZ and ZZN codons and their cog-nate tRNAs (22230405051) Simple rules emerge whenthe effect of the ZZN codon is considered (Figure 4) TheUUN and AAN codons are on the average more frameshift-prone than CCN and GGN Whatever Z is the two ho-mogeneous ZZN codons (meaning all purine or all pyrim-idine bases) are better shifters than the two correspondingheterogeneous ones To explain further all the variations inframeshifting frequency it is also necessary to take into ac-count the nature of the X nucleotide Motifs are by andlarge more frameshift-prone when the X and Z nucleotidesare homogeneous ie all purines or all pyrimidines No-table exceptions to the latter rule are Y YYA AAR andR RRU UUY (with Y = [U C] R = [A G]) Here thehigh shiftiness of the AAR and UUY codons probablycounteracts the negative effect of the YYA and RRU het-erogenous codons

Distribution of frameshift motifs in IS elements

From the above experimental study we concluded that amajority of heptamers (and nearly half of the tetramers)were capable of eliciting -1 frameshifting at substantiallevels (at least twice the background level) To determinethe range of motifs used in genes utilizing frameshift-ing for their expression we carried out an analysis ofIS mobile genetics elements known or suspected to usethis mode of translational control We focused on themembers of the IS1 and IS3 families available in theISFinder database in October 2012 (9) As shown inFigure 5 both tetramers and heptamers are found butwith a marked preference for heptamers (87 against 403)Among the five tetramers the three most shift-prone mo-tifs A AAG and U UU[UC] predominate and the lessefficient A AAA motif is also well represented Only 16different heptamers are found with 72 of them being ei-ther A AAA AAG or A AAA AAA The next most fre-quent are A AAA AAC and G GGA AAC both of lowefficiency followed by the more efficient G GGA AAGU UUU UUC and G GGA AAA To conclude it ap-pears that genes known or suspected to use PRF-1 to ex-press a biologically important protein do not necessarily uti-lize high efficiency motifs However this conclusion is based

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 6: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7215

motif amp PKIS3

no-motif amp PKIS3

motif amp no-stimulator

no-motif amp no-stimulator

U_UUU C A G

C_CC

U C A G

G_GGU C A G

A_AAU C A G

001

01

1

10

100

-1

fra

me

sh

iftin

g

Figure 3 Frameshift efficiency of the Z ZZN tetramers The IS3frameshift region cloned in plasmid pOFX310 was the one used in a previ-ous study [see Figure 1 in (19)] It differs slightly from the one used for theheptamer analysis (Figure 2 panel D) The nucleotides upstream (6 nt) anddownstream (5 nt) of the motif are those found in IS3 The sequence fromthe HindIII site to the start of the PK is agcuuCCUCCAZZZNGCCGCndashndash The no-stimulator construct was derived by deleting the 3prime half of thePK right after the UGA stop codon in the 0 frame to give the follow-ing sequence agcuuCCUCCAZZZNGCCGCGACAUACUUCGCGAAGGCCUGAACUUGAAgggcc The four frameshifting values for eachmotif correspond to a construct with a motif and the IS3 PK (open circles)a construct without motif and with the IS3 PK (open lozenges) a con-struct with motif and without stimulator (black inverted triangles) and aconstruct without motif and stimulator (open triangles) Each frameshift-ing value is the mean of five independent determinations (the plusmn standarddeviation intervals were omitted because they are not bigger than the sizeof the symbols in most cases) The no-motif constructs were derived bychanging each motif to either G YYN or C RRN

above background probably not as a result of frameshift-ing but because these sequences together with the followingG could act as SD sequences and direct low level initiationon the -1 frame AUA codon present 7 nt downstream (seelegend of Figure 3) The other oddity C AAG was nearly10 times above background (093) but only when the PKstimulator was present this was likely due to -1 frameshift-ing caused by the high shiftiness of the lysyl-tRNAUUU(4149) combined to the high efficiency of the PK stimula-tor

Frameshift propensity of the X XXZ ZZN motifs

The results obtained for the X XXZ ZZN heptamersand their non-shifty derivatives are presented in Figure4 The average background values given by the no-motifconstructs was 0037 plusmn 0010 for those with the IS911stimulators (open lozenges) and 0055 plusmn 0031 for thosewithout stimulator (open triangles) The estimated back-ground value for the IS3 constructs was of 0069 plusmn 0007(data not shown) The frameshifting frequency amongthe motif-containing constructs varied over a large rangeie from 0018 for G GGG GGU without stimulatorto 54 for C CCA AAG associated with the IS3 PKAs previously demonstrated in E coli the most efficient

motifs in the presence of the IS911 or IS3 stimulatorswere C CCA AAG G GGA AAG and A AAA AAG(194041) The ratio between the motif and no-motifframeshifting frequency values was used as a classifier ofmotif efficiency motifs displaying a ratio above 2 were cat-egorized as frameshift-prone Among the constructs with-out stimulator 39 motifs (61) met this criterion (ratiofrom 21 to 10) When the IS911 stimulators were added55 motifs (86) showed a ratio ranging from 21 to 112Swapping the moderate IS911 stimulators for the more effi-cient IS3 PK increased further the motif to no-motif ratio(from 21 to 1188) and raised the number of positive mo-tifs to 61 (953) the 3 motifs below the threshold wereA AAC CCA G GGC CCA and C CCG GGC

In spite of divergences as to its timing in the elongationcycle the general view concerning -1 frameshifting on slip-pery heptamers is that it occurs after proper decoding ofthe ZZN 0-frame codon when the P and A ribosomal sitesare occupied by the XXZ and ZZN codons and their cog-nate tRNAs (22230405051) Simple rules emerge whenthe effect of the ZZN codon is considered (Figure 4) TheUUN and AAN codons are on the average more frameshift-prone than CCN and GGN Whatever Z is the two ho-mogeneous ZZN codons (meaning all purine or all pyrim-idine bases) are better shifters than the two correspondingheterogeneous ones To explain further all the variations inframeshifting frequency it is also necessary to take into ac-count the nature of the X nucleotide Motifs are by andlarge more frameshift-prone when the X and Z nucleotidesare homogeneous ie all purines or all pyrimidines No-table exceptions to the latter rule are Y YYA AAR andR RRU UUY (with Y = [U C] R = [A G]) Here thehigh shiftiness of the AAR and UUY codons probablycounteracts the negative effect of the YYA and RRU het-erogenous codons

Distribution of frameshift motifs in IS elements

From the above experimental study we concluded that amajority of heptamers (and nearly half of the tetramers)were capable of eliciting -1 frameshifting at substantiallevels (at least twice the background level) To determinethe range of motifs used in genes utilizing frameshift-ing for their expression we carried out an analysis ofIS mobile genetics elements known or suspected to usethis mode of translational control We focused on themembers of the IS1 and IS3 families available in theISFinder database in October 2012 (9) As shown inFigure 5 both tetramers and heptamers are found butwith a marked preference for heptamers (87 against 403)Among the five tetramers the three most shift-prone mo-tifs A AAG and U UU[UC] predominate and the lessefficient A AAA motif is also well represented Only 16different heptamers are found with 72 of them being ei-ther A AAA AAG or A AAA AAA The next most fre-quent are A AAA AAC and G GGA AAC both of lowefficiency followed by the more efficient G GGA AAGU UUU UUC and G GGA AAA To conclude it ap-pears that genes known or suspected to use PRF-1 to ex-press a biologically important protein do not necessarily uti-lize high efficiency motifs However this conclusion is based

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 7: 1 ribosomal frameshifting in Escherichia

7216 Nucleic Acids Research 2014 Vol 42 No 11

U_UUU_UUU C A G

C_CCU_UUU C A G

G_GGU_UUU C A G

A_AAU_UUU C A G

001

01

1

10

100

A X_XXU_UUN motifs

-

1 fra

me

sh

iftin

g

U_UUA_AAU C A G

G_GGA_AAU C A G

A_AAA_AAU C A G

C_CCA_AAU C A G

C X_XXA_AAN motifs

no stimulator no stimulator no motifIS3 stimulator IS911 stimulators no motif IS911 stimulators

D X_XXG_GGN motifs

U_UUG_GGU C A G

G_GGG_GGU C A G

A_AAG_GGU C A G

C_CCG_GGU C A G

U_UUC_CCU C A G

G_GGC_CCU C A G

A_AAC_CCU C A G

C_CCC_CCU C A G

001

01

1

10

100

B X_XXC_CCN motifs

-

1 fra

meshifting

Figure 4 Frameshift efficiency of the X XXZ ZZN heptamers There are five frameshifting values for each motif corresponding to constructs with a motifand the IS3 PK (open circles) with a motif and the IS911 stimulators (black lozenges) without a motif and with the IS911 stimulators (open lozenges)with a motif and without stimulator (inverted grey triangles) and without both motif and stimulator (open triangles) (Figure 2) Each frameshifting valueis the mean of five independent determinations (the plusmnstandard deviation intervals were not added because they are not bigger than the size of the symbols)The no-motif constructs were obtained by changing the first and fourth nucleotides of each motif as detailed in Materials and Methods

on one category of genes where two overlapping genes codefor the proteins required for transposition of two types of ISelements There the purpose of frameshifting is to providethe lsquorightrsquo amount of a fusion protein which has the trans-posase function (1852) this amount is what keeps transpo-sition of the IS at a level without negative effect on the bac-terial host If the lsquorightrsquo amount is a low amount then theuse of low-efficiency motifs with or without flanking stim-ulators is a way to achieve this goal as illustrated by the IS1element (53)

Distribution of frameshift motifs in E coli genes

Our objective was to statistically assess the prevalence ofeach X XXZ ZZN motif in the genome of various E colistrains The rationale was that if a given motif induces byitself frameshifting at a significant level (ie at a biologi-cally detrimental level) then it should be counterselectedand therefore be underrepresented in E coli genes

(i) Generation of a non-redundant nrMEG In a recentstudy 61 bacterial genomes including strains of E coliShigella and Salmonella were compared to identifygene families that are conserved across all the genomes

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 8: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7217

35 A_AAG

19 U_UUC

19 A_AAA

13 U_UUU

1 G_GGA

0 050 1

Motif(Z_ZZN)

Occurence in the

IS3 family

Relative PRF-1 frequency(in the IS3 context)

A

175 A_AAA_AAG

116 A_AAA_AAA

30 A_AAA_AAC

15 G_GGA_AAC

11 G_GGA_AAG

11 U_UUU_UUC

10 G_GGA_AAA

7 U_UUA_AAG

7 U_UUA_AAA

7 U_UUU_UUG

4 C_CCA_AAA

4 U_UUU_UUA

2 U_UUU_UUU

2 C_CCU_UUG

1 U_UUA_AAU

1 A_AAA_AAU

0 C_CCA_AAG

Motif

(X_XXZ_ZZN)

Occurence in the IS1 ampIS3 families

Relative PRF-1 frequency(in the IS911 context)

0 050 1

B

Figure 5 Distribution of Z ZZN and X XXZ ZZN motifs in mobile el-ements from the IS1 and IS3 families These two families were selectedbecause biologically relevant -1 frameshifting was demonstrated in both(410) The sequences of the IS from these 2 families (63 entries for the IS1family and 494 for the IS3 family) obtained from the ISFinder database(October 2012) were examined for the presence of potential frameshift sig-nals (ie existence of 2 overlapping ORFs with the second being in the -1 frame relative to the first and presence of a Z ZZN or X XXZ ZZNmotif in the overlap region the Z ZZN motifs scored in panel B arethose which are not part of an X XXZ ZZN heptamer) The relativeframeshifting frequencies of the motifs found are indicated in the right-hand panels All values were normalized relative to that of the best motif(A AAG or C CCA AAG) using data from Figures 3 and 4

(core-genome 993 families) and gene families whichare specific to a particular genome (pan-genome 15741 families) (54) Our initial data set is somewhat in-spired by this study Sequences of all protein codinggenes from 37 genomes (28 E coli 1 E fergusonii7 Shigella and 1 Salmonella Refseq accessions wereavailable for only 37 genomes from the 61 mentioned

in the study see Supplementary Table S1) were firstcombined in a single file set comprising 169 302 se-quences to form what we call a MEG Sequences whichshared more than 95 identity were clustered togetherto remove redundancy and only one representative se-quence (randomly chosen) from each of these clusterswas retained This resulted in a significantly smallernrMEG of 22 703 sequences

(ii) Frequency of occurrences of heptameric patterns in thenrMEG The frequency of occurrences of each of the 64X XXZ ZZN patterns in the nrMEG was determinedAdditionally the frequency of occurrences of the samepattern in the two other frames (XX XXZ ZN andXXX ZZZ N) was also determined As a result a to-tal of 192 frequency counts (64 times 3) were obtained forthe nrMEG

(iii) Randomization of the nrMEG To determine whetheradverse selection acting on a heptamer is due to theirshifty properties and not due to other selective pres-sures such as mutational bias codon usage or compo-sitional bias of protein sequences it is necessary to esti-mate how other selective pressures affect heptamer fre-quencies For this purpose we used the Dicodonshufflerandomization procedure (44) Each gene sequence inthe nrMEG was randomized 1000 times This gave riseto a set of 1000 randomized enterobacterial genomeswhere each constituent gene sequence encodes for thesame protein sequence and has the same codon us-age and dinucleotide biases as in the native nrMEGHence the frequency of a heptamerrsquos occurrence inthese genomes could be used as an estimate of its fre-quency in the absence of selective pressure due to shift-prone properties of this pattern

(iv) Comparison of the observed and expected values of thefrequency counts for each pattern The frequency of oc-currences of each of the 64 XXXZZZN patterns ineach of the three possible frames (ie X XXZ ZZNXX XXZ ZN and XXX ZZZ N) was determinedacross the 1000 randomized nrMEGs Each pattern isrepresented by two numerical values the mean and thestandard deviation of its frequency distribution countacross the 1000 randomized nrMEGs To quantify thedegree of under- or overrepresentation of each patternwe used a z-score (see Material and Methods section)A negative z-score implies that a pattern is underrepre-sented while a positive z-score is indicative of its over-representation (Supplementary Table S3) Under ourassumption we would expect only the lsquoin-framersquo shiftymotif (X XXY YYZ) and not the lsquoout-of-framersquo shiftymotifs (XX XYY YZ and XXX YYY Z) to be under-represented

The comparison of each X XXZ ZZN pattern frequencyin the nrMEG with its distribution in the randomizednrMEGs is shown in Figure 6 Black dots correspond tothe number of occurrences of a particular motif in thenrMEG while the associated violin shows the distributionof the same motif across the randomized nrMEGs Com-parison with the in vivo data of Figure 4 shows that someof the sequence patterns which are characterized by amarked underrepresentation are also associated with high

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 9: 1 ribosomal frameshifting in Escherichia

7218 Nucleic Acids Research 2014 Vol 42 No 11

nu

mb

er

of g

en

es

U_UUU_UUU C A G

C_CCU_UUU C A G

A_AAU_UUU C A G

G_GGU_UUU C A G

A X_XXU_UUN motifs

0

500

1000

1500

U C A G U C A G U C A G U C A G

U_UUA_AA

G_GGA_AA

A_AAA_AA

C_CCA_AA

C X_XXA_AAN motifs

0

500

1000

1500

2000

U C A G U C A G U C A G U C A G

U_UUC_CC

G_GGC_CC

A_AAC_CC

C_CCC_CC

nu

mb

er

of g

en

es

B X_XXC_CCN motifs

0

200

400

600

U C A G U C A G U C A G U C A G

U_UUG_GG

G_GGG_GG

A_AAG_GG

C_CCG_GG

D X_XXG_GGN motifs

0

200

400

600

800

Figure 6 Distribution of the X XXY YYZ heptameric patterns across the nrMEG (black discs) and their spread across the 1000 randomized genomes(violins) The violin plots are a combination of a box plot and a kernel density plot where the width of the box is proportional to the number of data pointsin that box (45) The open circle in each violin correspond to the median the thick vertical lines (often masked by the open circle) around the medianrepresent the Inter Quartile Range while the thinner vertical lines that run through most of the violin plot represent 95 confidence intervals

frameshifting efficiency (eg A AAA AAG) Two motifsAAAAAAA and UUUUUUU are notably underrepre-sented in all three frames (Supplementary Table S3) Pos-sible reasons are that these patterns may interfere with geneexpression in a frame-independent manner producing in-dels in mRNA due to transcriptional slippage (55) or in-del mutations at a high rate (56) Two motifs the poorframeshifters C CCG GGC and C CCG GGU are no-tably underrepresented and one motif U UUC CCG ismarkedly overrepresented Frameshifting is not the onlyfactor that may affect evolution of codon co-occurrenceIt is possible that a particular pair of codons is slow todecode or results in ribosome drop-off Such factors re-

sult in codon pair bias The CCC GGY and UUC CCGcodon pairs were indeed shown to be less frequent than ex-pected for the formers and more frequent for the latter (57)The violin plots showing the comparison of patterns oc-currence in the nrMEG and their distribution in the ran-domized nrMEGs in all three frames are available online athttplaptiuccieheptameric patterns clusters

We anticipated that X XXZ ZZN patterns character-ized as shift-prone in our assays would be underrepre-sented due to selection pressure Therefore we expectedto find negative correlation between z-scores and observedframeshifting efficiencies (in the absence of a stimulator)for these patterns Surprisingly no significant anticorrela-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 10: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7219

r= -0020

-20 -10 0 10

00

05

10

15

-1

fram

eshi

fting

A All genes

zminusscore

-4 -2 0 2 4

zminusscore

B Highly expressed genes

-1

fram

eshi

fting

00

05

10

15 r= -0242

Figure 7 Plot of the z-score for all X XXZ ZZN motifs in the nrMEG(panel A) or in the HEGome (panel B) against the frameshifting efficiencyin the absence of stimulatory element Note that the much larger z-valuesobserved in (A) compared with (B) result from the much larger gene setbeing analysed in the former (leading to lower relative errors in what isessentially a Poisson system)

tion was found between the two measures (r = minus0020P = 0874 Figure 7A) Previously underrepresentationof one shift prone pattern (A AAA AAG) was found tobe more pronounced in highly expressed genes than inlowly expressed genes (6) Therefore the distribution ofX XXZ ZZN motifs was analysed among 253 genes pre-dicted as highly expressed in E coli K12 (HEG databasehttpgenomesurvcatHEG-DB) These sequences weresimilarly randomized (10 000 times instead of 1000 be-cause of the smaller size of this data set in comparison tothe nrMEG set) and z-scores for each of the 64 patternswere computed However the correlation coefficient still re-mained non-significant albeit only marginally (r = minus0242P = 0054 Figure 7B)

Search of genes possibly utilizing -1 programmed ribosomalframeshifting

The objective was to identify genes likely using PRF-1 onthe basis of several criteria (i) presence of an efficient mo-tif (defined below) (ii) conservation of this motif (or a verysimilar one) in a given family of homologous genes from36 selected genomes (see Supplementary Table S1 and Ma-terials and Methods) and even beyond in orthologous se-quences (iii) sequence conservation around the motif (iv)presence of potential stimulatory elements flanking the mo-tif and (v) position of the motif in the gene and consequenceof frameshifting in terms of protein products [ie synthesisof a shorter or of a longer hybrid protein note that the an-swer does not provide evidence for or against frameshiftingsince there are proven PRF -1 cases leading to one or theother outcome (12)]

We selected a subset of 18 X XXZ ZZN patterns witha negative z-score and an in vivo frameshifting efficiency ofmore than 010 in the absence of stimulators listed below

A AAA AAA A AAA AAG A AAA AACA AAG GGC

A AAG GGU C CCA AAA C CCA AAGC CCC CCA

C CCC CCG C CCU UUA C CCU UUGC CCU UUU

G GGA AAG U UUA AAG U UUA AAUU UUC CCA

U UUU UUC U UUU UUU

Three non-underrepresented patterns (C CCU UUCA AAG GGA and A AAG GGG) were also consideredbecause they exhibit high level frameshifting

Gene families possibly using these patterns for -1 PRFwere identified using the pipeline described in Materials andMethods This procedure led to 658 alignments which rep-resented gene families with sequences containing one ofthe 21 chosen X XXZ ZZN patterns Subsequent filter-ing on the basis frameshift site conservation reduced thatnumber to 69 clusters 8 correspond to mobile genetic ele-ments 5 are from prophage genes and 56 belong to othergene families (Supplementary Table S4 and S5) The mainfeatures of these 69 clusters are summarized in Figure 8 Itappears that the size of the gene containing the frameshiftsignal is very variable since it can code for a 44ndash1426 aminoacid protein (Supplementary Table S6 Figure 8A) In 57clusters the frameshift product is shorter than the prod-uct of normal translation (Supplementary Table S6 Fig-ure 8B) The degree of conservation of synonymous sitesaround the frameshift site was also analysed (46) Figure8C (httplaptiuccieheptameric patterns clusters) Syn-onymous sites are supposed to evolve neutrally unless thereare additional constraints acting at the nucleotide sequencelevel for example pressure to conserve an RNA struc-ture Only 2 out of the 56 non-mobile genes display re-duced variability at synonymous sites in the vicinity of theframeshift site whereas 4 IS clusters and 1 prophage clus-ter do show such suppression (RVSSalndiv column in Sup-plementary Tables S4 and S5) However failure to detectstatistically significant synonymous site conservation in theother clusters may be due to insufficient sequence diver-

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 11: 1 ribosomal frameshifting in Escherichia

7220 Nucleic Acids Research 2014 Vol 42 No 11

IS amp phages clusters

non-mobile genes

clusters

rvss +

rvss -

no stimulator with SD with structure

with SD amp structure longer -1 orf true PRF-1

C

Siz

e o

f fr

am

e 0

pro

tein

(a

min

oa

cid

s)

IS amp

phages

dnaX

True PRF-1

A

400

0

1200

800

Cluster order (relative size of frameshift product)

Re

lative

pro

tein

siz

e (

fra

me

0 p

rote

in=

1)

B

IS amp phages

dnaX

True PRF-1

frame 0 protein

frameshift protein

frame 0 protein up

to motif

0 10 20 30 40 50 60 70

0

1

2

3

4

0 10 20 30 40 50 60 70

Figure 8 Overview of the proteins produced by normal translation or by-1 frameshifting for one gene typical of each of the 69 clusters of genes se-lected on the basis of high conservation of an X XXZ ZZN motif in thenrMEG The selection pipeline is indicated in Materials and Methods andthe properties of each cluster are given in Supplementary Table S4ndashS6 Thedata of Supplementary Table S6 relative to the size of the protein productswere plotted as follows Panel A shows the size in amino acids of the full-length frame 0 protein as a function of the cluster order presented in thefirst column of Supplementary Table S6 (clusters were ordered by increas-ing value of the size ratio between the frameshift product and the frame 0protein) Panel B presents the size variation of the -1 frameshift product(circles) and of the normal frame 0 translation product up to the end ofthe X XXZ ZZN motif (triangles) (all sizes are relative to that of the cor-responding full-length frame 0 product) as a function of the cluster ordershown in the first column of Supplementary Table S6 The 10 demonstratedcases of frameshifting are indicated on each panel as true PRF-1 and dnaXor IS and phages (details about these clusters can be found in Supplemen-tary Tables S4 and S5) Panel C summarizes qualitatively the features ofeach of the 69 clusters Each square represents a cluster and the features(absence of stimulator presence of an upstream SD or of a downstreamstructure existence of a -1 protein longer than the 0 frame product anddemonstration of -1 PRF) are symbolized as indicated below the panelClusters in the two upper boxes are those displaying reduced variabilityat synonymous sites (rvss+ see Materials and Methods SupplementaryTables S4 and S5) and clusters in the two lower boxes are those withoutreduced variability (rvssminus)

gence (RVSSalndiv column in Supplementary Tables S4and S5) Among the clusters displaying reduced variability1 non-mobile cluster (A AAA AAG 6) and 3 IS clusters(A AAA AAG 2 A AAA AAG 3 and A AAA AAG 4)possess a proven or potential stimulatory structure down-stream of the motif One IS cluster with reduced variability(A AAA AAC 1 a proven case of frameshifting) has no es-tablished stimulator (453) Two IS clusters do not displayreduced synonymous site variability (A AAA AAA 1 andA AAA AAG 37) in spite of being proven cases where -1frameshifting is stimulated by a stem-loop structure (un-published data) (58)

In addition the region 30 nt upstream of the motifwas checked for the presence of a conserved SD-like se-quence and the region extending 200 nt downstream of theframeshift site was analysed for the presence of a conservedRNA secondary structure our criteria for a conserved stim-ulator was the presence of such a structure in at least 50 ofthe genes of a cluster The SD-like sequences to be searched6ndash17 nt upstream of the motif were those for which a stim-ulatory effect was experimentally demonstrated (Materialsand Methods) (32) A conserved SD was found in 8 out ofthe 56 non-mobile clusters and in 3 out of the 13 IS andprophage clusters (Figure 8C Supplementary Tables S4 andS8) In contrast a potential stimulatory structure was pre-dicted in a larger proportion of clusters a conserved hair-pin is present in the 8 IS clusters in 3 out of 5 of the phageclusters and in 42 out of the 56 non-mobile genes clusters(see Materials and Methods for the parameters used to de-fine the hairpin structure) Nine clusters possess both typesof stimulators To characterize further the predicted struc-tures we compared them with IS3 family members possess-ing a frameshift site and an associated stimulatory struc-ture (910) To assess structures of different sizes we useda single parameter Ghpntminus1 which is the Gunfold37Cvalue of the hairpin divided by the number of nucleotides inthe structure An overall comparison showed that taken to-gether the hairpins of our 53 clusters had a lower Ghpntminus1

than those from a set of 271 IS3 family members (0317plusmn 0114 versus 0452 plusmn 0124 kcalmolminus1ntminus1 Supplemen-tary Table S8 and Figure S2) For a more refined compar-ison 20 IS3 members were selected because they have ahairpin ranging from 17 to 131 nt The average Gntminus1

(Gavntminus1) downstream of the frameshift motif was de-termined as detailed in Materials and Methods for theseISs as well as for our 53 clusters The difference betweenGhpntminus1 and Gavntminus1 Gntminus1 was calculated andplotted against the size of the structure (Figure 9) It ap-peared that all the IS hairpins have a positive Gntminus1

value (ge009 kcalmolminus1ntminus1) indicating that the hairpinsegment is more structured than average as expected if thereis selective pressure for its maintenance (Figure 9A) Thedistribution of Gntminus1 values is clearly not the samefor our 53 clusters especially the non-mobile genes clus-ters (Figure 9B) only 14 of them are at or above the 009kcalmolminus1ntminus1 threshold value defined by the IS set Theremaining 28 clusters as well as 2 phage clusters and 1 IScluster appeared to have a local folding level close or evenbelow average This suggests that their respective potentialhairpins may not have been selected for but are fortuitousnon-biologically relevant structures

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 12: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7221

0 20 40 60 80 100 120

-02

00

02

04

06

IS3 family members

[(Δ

Ghp

nt-

1)

- (Δ

Gav

nt-

1) ]

A [

(ΔG

hpn

t-1)

- (Δ

Gav

nt-

1)]

00

02

04

-02

00

02

04

0 20 40 60 80 100 120

-02

size of structure (nt)

IS amp phages clusters

non-mobile genes clusters

B

Figure 9 Summary of the in silico search of conserved hairpin structuresconstituting potential frameshift stimulators (see Materials and Meth-ods) The x-axis indicates the size in nucleotide of the hairpin struc-ture and the y-axis shows the Gntminus1 parameter which is the differ-ence between the mean Gunfold per nucleotide of the conserved hairpin(Ghpntminus1 kcalmolminus1ntminus1) and the average Gunfold of structures pre-dicted in a sliding window of the same size as the corresponding con-served hairpin moved over a 197 nt segment starting 4 nt after the motif(Gav kcalmolminus1) (see also Supplementary Table S8) the error bar cor-respond to the standard deviation for that difference The symbols witha central black dot indicate genes for which PRF-1 is demonstrated orvery likely Part A shows the results for a set of IS3 family membersnamely (and ranked according to the size of the structure) IS3411 ISAca1ISSusp2 IS3 IS1221A ISBcen23 ISPae1 ISPsy11 ISBmu11 ISPosp5ISBcen22 ISBam1 ISHor1 ISXca1 ISL1 ISDde4 ISBlma5 ISSpwi1ISNisp3 and ISRle5 The dotted line is set at the lowest Gntminus1 value(009 kcalmolminus1ntminus1) found for ISPsy1 Part B shows the Gntminus1 re-sults for 53 gene clusters out of the 69 that have a conserved hairpin TheIS (circles) and phage (squares) clusters are in the upper panel and thenon-mobiles genes clusters in the bottom panel

DISCUSSION

Comparison and meaning of prokaryotic and eukaryoticframeshifting rules

We determined that in E coli the rules of frameshifting onZ ZZN tetramers are in terms of motif hierarchy A AAGgt U UUY gt C CCY gtA AAA the 10 remaining mo-tifs were found barely or not at all frameshift-prone in ourconditions Thus maintenance of a cognate tRNA-codoninteraction after re-pairing of the A-site tRNA in the -1frame is important to ensure efficient frameshifting No-tably the maximal level of frameshifting remained about10-fold lower than observed with the best heptamer asso-ciated with the same stimulatory element In contrast alsowith the heptamers frameshifting on the Z ZZN motifsdefinitely requires presence of a strong stimulator (Figure3)

Relative frameshifting frequencies of 44 X XXZ ZZNheptamers tested in a eukaryotic system (rabbit reticulo-cytes lysate) (17) or in E coli (Figure 4) are displayedin Supplementary Figure S1A the motifs were placed up-stream of stimulatory elements of similar efficiency theavian infectious bronchitis virus (IBV) PK for the eukary-otic assay and the IS3 PK for the E coli series A third of themotifs were not experimentally tested in the eukaryotic con-text because they were expected to be inefficient Neverthe-less major rules of -1 frameshifting efficiency as a functionof the identity of the X Z and N nucleotides can be for-mulated (Supplementary Figure S1B) It appears that themost efficient motifs are D DDA AAH D DDU UUHU UUU UUG and C CCU UUU in the eukaryotic sys-tem and V VVA AAR H HHU UUY U UUA AAGC CCC CCC and A AAG GGG in E coli (with D = [UA G] H = [U C A] R = [AR] Y = [UC] and V = [CA G]) The outcome is a slightly larger number of high effi-ciency motifs in the eukaryotic situation than in the bacte-rial one (20 versus 15) at least in the nucleotide context inwhich the motifs were tested in both studies While the nu-cleotides immediately flanking a given motif can modulateframeshift level (192036ndash38) they probably cannot turn aninefficient motif into a highly efficient one or vice versa assuggested by one analysis in E coli [see Table 3 in (36)] Thatthe above rules likely apply to other eukaryotic organisms issupported by studies on yeast and plant viruses (5159ndash61)Several observations suggest that the E coli rules are prob-ably valid for many other bacterial species A survey of theGtRNAdb database [(62) httpgtrnadbucscedu] in June2013 indicates that out of 431 different bacterial species 168possess only one type of lys-tRNA with a 3primeUUU5

prime anti-codon like E coli (41) In these species covering all the ma-jor bacterial phyla the A AAR and V VVA AAR motifsshould be as shift-prone as in E coli Interestingly the sametwo types of motifs are highly prevalent (745) amongnon-redundant IS elements from the IS1 and IS3 familiespresent in the ISFinder database (Figure 5) Furthermoreamong the 134 species present in the GtRNAdb databaseand in which IS3 family transposable elements are found(ISFinder database October 2012) 83 contain both types oflys-tRNA (3primeUUU5prime and 3primeUUC5prime anticodons) ISs with anA AAG or V VVA AAG motif are present in 46 of these

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 13: 1 ribosomal frameshifting in Escherichia

7222 Nucleic Acids Research 2014 Vol 42 No 11

83 species Thus presence of a lys-tRNA with a 3primeUUC5prime an-ticodon which should pair perfectly with the AAG codonand thus reduce frameshifting (41) does not preclude theuse of A AAG or V VVA AAG frameshift motifs in ISelements from many bacterial species

A common feature of tetramer and heptamers motifs isthe preferred identity for the Z nucleotide A or U consti-tuting the first two bases of the ZZN codon This suggeststhat a weak tRNA-codon pairing interaction in the A site isa universal pre-requisite for high level -1 frameshifting (17)The major differences concern the identity of the X and Nnucleotides While Neuk can be A C or U Nprok identity islinked to that of Z so that ZZNprok must be all purines orall pyrimidines to achieve high frameshifting level In termsof tRNA-codon relations this suggests that the prokary-otic ribosome tolerates less readily a non-cognate interac-tion after frameshifting (eg following a shift from AACto AAA) than its eukaryotic counterpart One possibilityis that the bacterial ribosome still monitors the correct-ness of the codonndashanticodon pairing in the A site even af-ter frameshifting Concerning the ribosomal P site tRNAwhich has to shift from XXZ to XXX the prokaryotic ribo-some still displays the same preference for cognate pairing inthe new frame when Z is U However it is more eukaryotic-like when Z is A a feature reflecting the high shiftiness ofbacterial lys-tRNAUUU especially when ZZN is AAG [Fig-ures 3 and 4 (41)]

The previous paragraph highlighted the most efficientmotifs and their properties as revealed in three particu-lar contexts (IS911 IS3 and no-stimulator see Figure 2)Overall about 61 of the heptamers are significantly shift-prone to very different extent in the absence of stimula-tors a feat confirming that the motif ie tRNA re-pairingis the primary determinant of -1 frameshifting Stimula-tory elements cannot induce frameshifting by themselvesThey likely facilitate tRNA re-pairing by causing ribo-some pausing (27ndash29) and by promoting mRNA realign-ment (30ndash32) It is interesting to note that heptamers oflow efficiency in E coli (Figure 4) like A AAA AAC andG GGA AAC are nevertheless very likely used for pro-grammed frameshifting by bacterial IS elements [Figure 5(10)]

Distribution of heptameric frameshift motifs in genes from 28E coli strains and 7 other enterobacterial strains

Study of the distribution of the 64 X XXZ ZZN heptamersin 22 703 sequences selected to constitute our nrMEGrevealed that about 66 of the motifs were underrepre-sented to different extents (Figure 6 Supplementary Ta-ble S3) However there was no significant anticorrelationbetween the observed frameshifting efficiency and the un-derrepresentation of shifty patterns when all the genes aretaken into account or when only a subset of genes catego-rized as highly expressed was considered (Figure 7) The lat-ter finding was unexpected because it is believed that thedeleterious effect of frameshift-prone patterns at least inhighly expressed genes (6) should increase with increasedframeshifting efficiency and thus augment the pressure forselection against these sequences in protein coding regionsAt this point we may only speculate about possible reasons

One reason could be the dependency of frameshifting onthe context Such context effects involving nucleotides lo-cated immediately upstream or downstream of some mo-tifs (tetramers and heptamers) were revealed through di-rected mutagenesis of frameshifting signals of prokaryoticand eukaryotic origin (192036ndash38) Our experimental as-says were carried out in a limited set of nucleotide contextsurrounding the patterns and therefore our results may notreflect the frameshifting efficiencies of these patterns in alltheir native contexts Another possibility is that even for themost efficient motifs placed in the best immediate contextframeshifting frequency remains sufficiently low in the ab-sence of stimulatory elements so as to have no detrimentaleffect on bacterial fitness For the three best V VVA AAGheptamers(V = [CAG]) this frequency is at around 03(Figure 4) From another study we know that frameshift-ing on these motifs could be increased by about 124-fold atmost ie going up to 37 by modifying the 3prime context (36)But this is still much lower if compared to the cumulative ef-fect of background translational errors missense errors anddrop-off have a total estimated frequency of about 5times10minus4

per amino-acid and thus would result in sim18 of incorrectchains for a 500 amino-acids protein (63)

Search for genes potentially using -1 frameshifting

Previous attempts to find novel recoded genes in bacte-ria used two different approaches one based on search offrameshift-prone motifs (664) and the other based on theidentification and characterization of disrupted coding se-quences (111314) The former led to identification of afew candidate genes only but the search was restricted toa limited number of motifs and to one organism only Ecoli The studies using the second approach were more ex-haustive since they used all the available sequenced bacterialgenomes Consequently they brought more candidates Asearch of genes with disrupted open reading frames (ORFs)among 973 genomes initially revealed about 1000 candidategenes 75 of which could be grouped into 64 clusters (11)Assuming an average number of 3130 protein-coding genesper genome this gives a frequency of candidates of about003 Sequence comparison showed that 47 clusters con-tained genes from IS mobile genetic elements Interestinglya substantial proportion of them (22 clusters) may use pro-grammed transcriptional realignment rather than transla-tional -1 frameshifting (12 clusters) and in 9 clusters bothtypes of recoding may operate The analysis with the Gene-Tack program of 1106 microbial genomes carried out byAntonov et al (1314) eventually revealed 4730 genes po-tentially using frameshifting (in the +1 or -1 direction) ortranscriptional realignment which were grouped into 146clusters IS transposable elements genes are found in a mi-nority of clusters 17 Thus other categories of genes of var-ious functions predominate However if the absolute num-ber of genes is considered then IS elements prevail witha total number of 3317 genes This probably reflects thatISs are prone to horizontal transfer and are often presentin multiple copies in a genome Assuming again an aver-age number of 3130 protein-coding genes per genome thenthe overall frequency of candidates found by Antonov et al(14) among 1106 genomes is about 014 One drawback

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 14: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7223

of the disrupted-ORF approach is that it fails to detectcases where frameshifting would lead to a protein shorterthan the product of normal translation An example is pro-vided by the dnaX gene family while in Ecoli and manybacteria frameshifting leads to a shorter protein it resultsin a longer product in a more limited number of bacteriaThe GeneTack analysis detected only the later cases in 17genomes [see COF 239165634 in the GeneTack prokaryoticframeshift database (14)]

In contrast approaches primarily based on the search offrameshift motifs allow detection of both types of recodingoutcomes but the motifs are so short (eg 4 or 7 nucleotidesfor -1 frameshift motifs) that in the absence of proper fil-tering too many candidates are found even if presence ofa potentially stimulatory structure downstream of the mo-tif is an added condition An illustration is provided bya study of the yeast genome 20 of the ORFs (ie 1275out of 6353) were found to contain one or more lsquostrongrsquo -1frameshift signal (7) by strong the authors mean that thereis an efficient motif (as defined in the previous section) and adownstream structure (the total number of strong candidateframeshift regions was 1679) Since in 99 of the cases -1frameshifting leads to rapid premature termination it wasproposed and then experimentally substantiated at least fora few genes (65) that -1 PRF is largely used in yeast for reg-ulatory purpose rather than to generate as is the case in ISelements or viruses a fusion protein with a new carboxyl-terminal functional domain (7) A subsequent study using adifferent method for structure prediction and scoring leadto a much less optimistic evaluation only 74 candidates ofthe former study were retained (66)

One aim of the present study was to identify genes con-taining 21 selected frameshift motifs within 36 individualgenomes 27 of which are from E coli strains The cumu-lated number of protein coding genes is 165 099 and amongthem 31 180 contain at least one of the 21 motifs thus theoverall frequency of motif-containing genes is 189 Afterfiltering and enrichment beyond the E coli species the out-come was a set of 69 clusters (ie a total 10 918 genes) eachbeing a group of closely related genes where a given mo-tif is conserved Since about 10 392 019 genes were testedthe final yield of frameshift candidates was of 010 Thisvalue is close to that obtained by Antonov et al (14) there-fore suggesting achievement of a similar stringency by bothsearches An internal validation of our method was pro-vided by the fact that expected cases (dnaX gene 7 IS el-ements and 2 bacteriophage genes marked as [true] in Sup-plementary Tables S4 and S5 and in Figure 8) were found inthe final set Among the 59 remaining clusters 1 is from ISelements 3 are from prophages genes and 55 are from non-mobile cellular genes of known and unknown functions In57 of our 69 clusters like in E coli dnaX gene but in contrastwith ISs from the IS1 and IS3 families and all the 146 pro-grammed frameshift clusters from the GeneTack databaseframeshifting would lead to a product shorter than the pro-tein resulting from normal translation (Figure 8B) Whetheror not this type of frameshifting affects mRNA stabilityas proposed for yeast candidates (765) remains to be de-termined A majority of our clusters 55 contain a con-served potential stimulatory element (as defined in Mate-rials and Methods) there is an upstream SD in 2 clusters a

downstream hairpin in 44 and both types of stimulators in9 (Figure 8C) However assessment of the hairpins with theGntminus1 parameter suggested that they may be relevantin only 1 phage cluster and in 14 non-mobile genes clusters(Figure 9 Supplementary Table S8) Furthermore if motifefficiency is taken as an additional constraint only 8 clus-ters remain as best candidates for high level -1 frameshift-ing (marked with in Supplementary Table S8) As shownin Supplementary Table S7 there is a limited overlap be-tween our 69 clusters and the 146 clusters of Antonov et al(1314) thus demonstrating that the two approaches arecomplementary The present study in agreement with pre-vious ones (11ndash14) suggests that recoding in bacteria ismostly found (at least in terms of absolute number of can-didates genes) in IS transposable elements and in a fewbacteriophage genes However if numbers of gene clustersare considered then it appears that non-mobile genes clus-ters predominate Information about the function of thesegenes as found in the Ecogene database (67) is shown inSupplementary Table S5 Seventeen are of unknown func-tion four are predicted to be transcriptional regulators andthe rest have different predicted functions

Once candidates have been found a critical issue asstressed in a recent review (68) is their functional valida-tion This entails two steps (i) the demonstration that thereis frameshifting (or transcriptional slippage) on the pre-dicted signal and (ii) the determination of the cellular func-tion of the recoding product Full functional analysis hasnot yet been carried out on the 146 GeneTack clusters oron the candidates genes reported in this study Represen-tatives of both studies 20 of the GeneTack clusters (14)and 2 of our 8 best candidates (Supplementary Table S8)were tested for the first step only Promisingly 14 of the 20GeneTack candidates and both of ours were found capableof eliciting frameshifting at different but substantial levelsie from 03 to 63 [(14) Supplementary Figure S3] Thechallenge is now to carry on with the complete experimen-tal characterization of all candidates to establish which ofthem indeed use recoding to synthesize alternate proteinsbiologically pertinent for E coli and other bacterial species

AVAILABILITY

Additional information about clusters is available at httplaptiuccieheptameric patterns clusters

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENT

The help of Claire Bertrand and Patricia Licznar at an earlystage of this project as well as the hospitality of Mick Chan-dler and Bao Ton Hoang and the support from Agamem-non Carpousis are gratefully acknowledged

FUNDING

Centre national de la Recherche Scientifique [to OF] Uni-versite Paul Sabatier-Toulouse III [to OF] Agence Na-tional de la Recherche [grant number NT05ndash1 44848 to

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 15: 1 ribosomal frameshifting in Escherichia

7224 Nucleic Acids Research 2014 Vol 42 No 11

OF] Wellcome Trust [grant numbers 094423 to PVB and088789 to AEF] and Science Foundation Ireland [grantnumber R14989 to JFA] Source of open access fundingWelcome Trust [094423 to PVB]Conflict of interest statement None declared

REFERENCES1 JacksT and VarmusH (1985) Expression of the Rous sarcoma virus

pol gene by ribosomal frameshifting Science 230 1237ndash12422 JacksT MadhaniHD MasiarzFR and VarmusHE (1988a)

Signals for ribosomal frameshifting in the Rous sarcoma virusgag-pol region Cell 55 447ndash458

3 JacksT PowerM MasiarzF LuciwP BarrP and VarmusH(1988b) Characterization of ribosomal frameshifting in HIV-1gag-pol expression Nature 331 280ndash283

4 SekineY and OhtsuboE (1989) Frameshifting is required forproduction of the transposase encoded by insertion sequence 1 ProcNatl Acad Sci 86 4609ndash4613

5 TsuchihashiZ and KornbergA (1990) Translational frameshiftinggenerates the gamma subunit of DNA polymerase III holoenzymeProc Natl Acad Sci 87 2516ndash2520

6 GurvichOL BaranovPV ZhouJ HammerAW GestelandRFand AtkinsJF (2003) Sequences that direct significant levels offrameshifting are frequent in coding regions of Escherichia coliEMBO J 22 5941ndash5950

7 JacobsJL BelewAT RakauskaiteR and DinmanJD (2007)Identification of functional endogenous programmed -1 ribosomalframeshift signals in the genome of Saccharomyces cerevisiae Nucleicacids Res 35 165ndash174

8 BekaertM FirthAE ZhangY GladyshevVN AtkinsJF andBaranovPV (2010) Recode-2 new design new search tools andmany more genes Nucleic Acids Res 38(Database issue) D69ndashD74

9 SiguierP VaraniA PerochonJ and ChandlerM (2012) Exploringbacterial insertion sequences with ISfinder objectives uses andfuture developments Methods Mol Biol 859 91ndash103

10 FayetO and PrereMF (2010) Programmed ribosomal-1frameshifting as a tradition the bacterial transposable elements ofthe IS3 Family In AtkinsJF and GestelandRF (eds) RecodingExpansion of Decoding Rules Enriches Gene Expression Vol 24Springer New York Heidelberg pp 259ndash280

11 SharmaV FirthAE AntonovI FayetO AtkinsJFBorodovskyM and BaranovPV (2011) A pilot study of bacterialgenes with disrupted ORFs reveals a surprising profusion of proteinsequence recoding mediated by ribosomal frameshifting andtranscriptional realignment Mol Biol Evol 28 3195ndash3211

12 BaranovPV FayetO HendrixRW and AtkinsJF (2006)Recoding in bacteriophages and bacterial IS elements Trends Genet22 174ndash181

13 AntonovI BaranovP and BorodovskyM (2013a) GeneTackdatabase genes with frameshifts in prokaryotic genomes andeukaryotic mRNA sequences Nucleic Acids Res 41 D152ndashD156

14 AntonovI CoakleyA AtkinsJF BaranovPV andBorodovskyM (2013b) Identification of the nature of reading frametransitions observed in prokaryotic genomes Nucleic Acids Res 416514ndash6530

15 WeissRB DunnDM AtkinsJF and GestelandRF (1987)Slippery runs shifty stops backward steps and forward hops -2 -1+1 +2 +5 and +6 ribosomal frameshifting Cold Spring HarbSymp Quant Biol 52 687ndash693

16 WeissRB DunnDM AtkinsJF and GestelandRF (1990)Ribosomal frameshifting from -2 to +50 nucleotides Prog NucleicAcids Res Mol Biol 39 159ndash183

17 BrierleyI JennerAJ and InglisSC (1992) Mutational analysis ofthe lsquoslippery-sequencersquo component of a coronavirus ribosomalframeshifting signal J Mol Biol 227 463ndash479

18 SekineY EisakiN and OhtsuboE (1994) Translational control inproduction of transposase and in transposition of insertion sequenceIS3 J Mol Biol 235 1406ndash1420

19 LicznarP MejlhedeN PrereMF WillsN GestelandRFAtkinsJF and FayetO (2003) Programmed translational -1frameshifting on hexanucleotide motifs and the wobble properties oftRNAs EMBO J 22 4770ndash4778

20 NapthineS VidakovicM GirnaryR NamyO and BrierleyI(2003) Prokaryotic-style frameshifting in a plant translation systemconservation of an unusual single-tRNA slippage event EMBO J22 3941ndash3950

21 MazauricMH LicznarP PrereMF CanalI and FayetO (2008)Apical loop-internal loop RNA pseudoknots a new type ofstimulator of -1 translational frameshifting in bacteria J Biol Chem283 20421ndash20432

22 BaranovPV GestelandRF and AtkinsJF (2004) P-site tRNA is acrucial initiator of ribosomal frameshifting RNA 10 221ndash230

23 LarsenB GestelandRF and AtkinsJF (1997) Structural probingand mutagenic analysis of the stem-loop required for Escherichia colidnaX ribosomal frameshifting programmed efficiency of 50 JMol Biol 271 47ndash60

24 GiedrocDP and CornishPV (2009) Frameshifting RNApseudoknots structure and mechanism Virus Res 139 193ndash208

25 BrierleyI and Dos RamosFJ (2006) Programmed ribosomalframeshifting in HIV-1 and the SARS-CoV Virus Res 119 29ndash42

26 KollmusH HonigmanA PanetA and HauserH (1994) Thesequences of and distance between two cis-acting signals determinethe efficiency of ribosomal frameshifting in humanimmuno-deficiency virus type 1 and human T-cell leukemia virus typeII in vivo J Virol 68 6087ndash6091

27 TuC TzengTH and BruennJA (1992) Ribosomal movementimpeded at a pseudoknot required for frameshifting Proc NatlAcad Sci 89 8636ndash8640

28 SomogyiP JennerAJ BrierleyI and InglisSC (1993) Ribosomalpausing during translation of an RNA pseudoknot Mol Cell Biol13 6931ndash6940

29 TholstrupJ OddershedeLB and SoslashrensenMA (2012) mRNApseudoknot structures can act as ribosomal roadblocks Nucleic AcidsRes 40 303ndash313

30 PlantEP JacobsKL HargerJW MeskauskasA JacobsJLBaxterJL PetrovAN and DinmanJD (2003) The 9-A solutionhow mRNA pseudoknots promote efficient programmed minus1ribosomal frameshifting RNA 9 168ndash174

31 NamyO MoranSJ StuartDI GilbertRJ and BrierleyI (2006)A mechanical explanation of RNA pseudoknot function inprogrammed ribosomal frameshifting Nature 441 244ndash247

32 PrereMF CanalI WillsNM AtkinsJF and FayetO (2011) Theinterplay of mRNA stimulatory signals required for AUU-mediatedinitiation and programmed -1 ribosomal frameshifting in decoding oftransposable element IS911 J Bacteriol 193 2735ndash2744

33 ShineJ and DalgarnoL (1974) The 3rsquo-terminal sequence ofEscherichia coli 16S ribosomal RNA complementarity to nonsensetriplets and ribosome binding sites Proc Natl Acad Sci 711342ndash1346

34 LarsenB WillsNM GestelandRF and AtkinsJF (1994)rRNAmRNA base pairing stimulates a programmed -1 ribosomalframeshift J Bacteriol 176 6842ndash6851

35 LiGW OhE and WeissmanJS (2012) The anti-Shine-Dalgarnosequence drives translational pausing and codon choice in bacteriaNature 484 538ndash541

36 BertrandC PrereMF GestelandRF AtkinsJF and FayetO(2002) Influence of the stacking potential of the base 3rsquo of tandemshift codons on ndash1 ribosomal frameshifting used for gene expressionRNA 8 16ndash28

37 KimYG MaasS and RichA (2001) Comparative mutationalanalysis of cis-acting RNA signals for translational frameshifting inHIV-1 and HTLV-2 Nucleic Acids Res 29 1125ndash1131

38 LegerM DuludeD SteinbergSV and Brakier-GingrasL (2007)The three transfer RNAs occupying the A P and E sites on theribosome are involved in viral programmed -1 ribosomal frameshiftNucleic Acids Res 35 5581ndash5592

39 DemeshkinaN JennerL YusupovaG and YusupovM (2010)Interactions of the ribosome with mRNA and tRNA Curr OpinStruct Biol 20 325ndash332

40 WeissRB DunnDM ShuhM AtkinsJF and GestelandRF(1989) E coli ribosomes re-phase on retroviral frameshift signals atrates ranging from 2 to 50 percent New Biol 1 159ndash169

41 TsuchihashiZ and BrownPO (1992) Sequence requirements forefficient translational frameshifting in the Escherichia coli dnaX geneand the role of an unstable interaction between tRNALys and anAAG lysine codon Genes Dev 6 511ndash519

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018

Page 16: 1 ribosomal frameshifting in Escherichia

Nucleic Acids Research 2014 Vol 42 No 11 7225

42 BrierleyI MeredithMR BloysAJ and HagervallTG (1997)Expression of a coronavirus ribosomal frameshift signal inEscherichia coli influence of tRNA anticodon modification onframeshifting J Mol Biol 270 360ndash373

43 MillerJH (1992) A Short Course in Bacterial Genetics A LaboratoryManual and Handbook for Escherichia coli and Related Bacteria ColdSpring Harbor Laboratory Press Cold Spring Harbor NY USA

44 KatzL and BurgeCB (2003) Widespread selection for local RNAsecondary structure in coding regions of bacterial genes GenomeRes 13 2042ndash2051

45 HintzeJL and NelsonRD (1998) Violin plots a box plot-densitytrace synergism Am Stat 52 181ndash184

46 FirthAE WillsNM GestelandRF and AtkinsJF (2011)Stimulation of stop codon readthrough frequent presence of anextended 3rsquo RNA structural element Nucleic Acids Res 396679ndash6691

47 GruberAR LorenzR BernhartSH NeubockR andHofackerIL (2008) The Vienna RNA websuite Nucleic Acids Res36(Web Server issue) W70ndashW74

48 RettbergCC PrereMF GestelandRF GestelandRF andFayetO (1999) A three-way junction and constituent stem-loops asthe stimulator for programmed ndash1 frameshifting in bacterial insertionsequence IS911 J Mol Biol 286 1365ndash1378

49 YokoyamaS and NishimuraS (1995) Modified nucleosides andcodon recognition In SollD and RajBhandaryUL (eds) tRNAStructure Biosynthesis and Function ASM Press Washington DCUSA pp 207ndash223

50 BrierleyI GilbertRJC and PennellS (2010)Pseudoknot-dependent programmed -1 frameshifting structuresmechanisms and models In AtkinsJF and GestelandRF (eds)Recoding Expansion of Decoding Rules Enriches Gene ExpressionVol 24 Springer New York Heidelberg pp 221ndash247

51 FarabaughPJ (2010) Programmed frameshifting in budding yeastIn AtkinsJF and GestelandRF (eds) Recoding Expansion ofDecoding Rules Enriches Gene Expression Vol 24Springer NewYork Heidelberg pp 221ndash247

52 PolardP PrereMF ChandlerM and FayetO (1991) Programmedtranslational frameshifting and initiation at an AUU codon in geneexpression of bacterial insertion sequence IS911 J Mol Biol 222465ndash477

53 EscoubasJM PrereMF FayetO SalvignolI GalasDZerbibD and ChandlerM (1991) Translational control oftransposition activity of the bacterial insertion sequence IS1 EMBOJ 10 705ndash712

54 LukjancenkoO WassenaarTM and UsseryDW (2010)Comparison of 61 sequenced Escherichia coli genomes Microb Ecol60 708ndash720

55 BaranovPV HammerAW ZhouJ GestelandRF and AtkinsJF(2005) Transcriptional slippage in bacteria distribution in sequencedgenomes and utilization in IS element gene expression Genome Biol6 R25

56 GuT TanS GouX ArakiH and TianD (2010) Avoidance oflong mononucleotide repeats in codon pair usage Genetics 1861077ndash1084

57 BuchanJR AucottLS and StansfieldI (2006) tRNA propertieshelp shape codon pair preferences in open reading frames NucleicAcids Res 341015ndash1027

58 VogeleK SchwartzE WelzC SchiltzE and RakB (1991)High-level ribosomal frameshifting directs the synthesis of IS150 geneproducts Nucleic Acids Res 19 4377ndash4385

59 BekaertM and RoussetJP (2005) An extended signal involved ineukaryotic -1 frameshifting operates through modification of the Esite tRNA Mol Cell 17 61ndash68

60 PlantEP and DinmanJD (2006) Comparative study of the effectsof heptameric slippery site composition on -1 frameshifting amongdifferent eukaryotic systems RNA 12 666ndash673

61 MillerWA and GiedrocDP (2010) Ribosomal Frameshifting inDecoding Plant Viral RNAs In AtkinsJF and GestelandRF(eds) Recoding Expansion of Decoding Rules Enriches GeneExpression Vol 24 Springer New York Heidelberg pp 193ndash220

62 ChanPP and LoweTM (2009) GtRNAdb a database of transferRNA genes detected in genomic sequence Nucleic Acids Res 37D93ndashD97

63 KurlandCG (1992) Translational accuracy and the fitness ofbacteria Annu Rev Genet 26 29ndash50

64 LiaoPY ChoiYS and LeeKH (2009) FSscan amechanism-based program to identify +1 ribosomal frameshifthotspots Nucleic Acids Res 37 7302ndash7311

65 BelewAT AdvaniVM and DinmanJD (2011) Endogenousribosomal frameshift signals operate as mRNA destabilizing elementsthrough at least two molecular pathways in yeast Nucleic Acids Res39 2799ndash2808

66 TheisC ReederJ and GiegerichR (2008) KnotInFrame predictionof -1 ribosomal frameshift events Nucleic Acids Res 36 6013ndash6020

67 ZhouJ and RuddK E (2013) EcoGene 30 Nucleic Acids Res 41(D1) D613ndashD624

68 KettelerR (2012) On programmed ribosomal frameshifting thealternative proteomes Front Genet 3 242

Downloaded from httpsacademicoupcomnararticle-abstract421172101451769by gueston 02 April 2018