evolution of selenophosphate synthetases: emergence and relocation...
TRANSCRIPT
1
Evolution of selenophosphate synthetases: emergence and relocation 1
of function through independent duplications and recurrent 2
subfunctionalization 3
Marco Mariotti1, 2, 3, 4*
, Didac Santesmasses1, 2, 3
, Salvador Capella-Gutierrez1, 2
, Andrea 4
Mateo5, Carme Arnan
1, 2, 3, Rory Johnson
1, 2, 3, Salvatore D’Aniello
6, Sun Hee Yim
4, 5
Vadim N Gladyshev4, Florenci Serras
5, Montserrat Corominas
5, Toni Gabaldón
1, 2, 7, 6
Roderic Guigó1, 2, 3*
7
*. Corresponding author; email: [email protected] or [email protected] 8
1. Bioinformatics and Genomics Programme, Centre for Genomic Regulation 9
(CRG), Dr. Aiguader 88, 08003 Barcelona, Catalonia, Spain. 10
2. Universitat Pompeu Fabra (UPF), 08003 Barcelona, Catalonia, Spain. 11
3. Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), 08003 Barcelona, 12
Catalonia, Spain 13
4. Division of Genetics, Department of Medicine, Brigham & Women’s Hospital, 14
Harvard Medical School, Boston, MA 02115, USA 15
5. Departament de Genètica, Facultat de Biologia and Institut de Biomedicina 16
(IBUB) de la Universitat de Barcelona (UB), 08028 Barcelona, Catalonia, Spain 17
6. Department of Biology and Evolution of Marine Organisms, Stazione Zoologica 18
Anton Dorhn, Villa Comunale, 80121, Napoli, Italy. 19
7. Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís 20
Companys 23, 08010 Barcelona, Catalonia, Spain. 21
22
Running Title: Phylogeny of selenophosphate synthetases 23
Keywords: selenocysteine, gene duplication, subfunctionalization, 24
evolution of function, readthrough. 25
Abstract 26
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
2
27
Selenoproteins are proteins that incorporate selenocysteine (Sec), a non-standard 28
amino acid that is encoded by UGA, normally a stop codon. The synthesis of Sec 29
requires the enzyme Selenophosphate synthetase (SPS or SelD), conserved in all 30
prokaryotic and eukaryotic genomes encoding selenoproteins. Here we study the 31
evolutionary history of SPS genes, providing a map of selenoprotein function spanning 32
the whole tree of life. SPS is itself a selenoprotein in many species, though functionally 33
equivalent homologues that replace the Sec site with cysteine (Cys) are common. Many 34
metazoans, however, possess SPS genes with substitutions other than Sec or Cys 35
(collectively referred to as SPS1). Using complementation assays in fly mutants, we 36
show that these genes share a common function, which appears to be distinct from the 37
synthesis of selenophosphate carried out by the Sec- and Cys- SPS genes (termed 38
SPS2), and unrelated to Sec synthesis. Even though sharing the same function, we 39
show here that SPS1 genes originated through a number of independent gene 40
duplications from an ancestral metazoan selenoprotein SPS2 gene, that most likely 41
already carried the SPS1 function. Thus, in SPS genes, parallel duplications and 42
subsequent convergent subfunctionalization have resulted in the segregation to different 43
loci of functions initially carried by a single gene. A key actor in the evolution of SPS 44
genes is a stem-loop RNA structure enhancing the readthrough of the Sec-UGA codon, 45
whose origin may be traced back to prokaryotes. The evolutionary history of SPS 46
constitutes a remarkable example of emergence and evolution of gene function, which 47
we have been able to trace with unusual detail thanks to the singular features of SPS 48
genes, wherein the amino acid at a single site determines unequivocally protein function 49
and is intertwined to the evolutionary fate of the entire selenoproteome. 50
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
3
Introduction 51
Selenoproteins are proteins that incorporate the non-standard amino acid 52
selenocysteine (Sec or U) in response to the UGA codon. The recoding of UGA, 53
normally a stop codon, to code for Sec is arguably the most outstanding programmed 54
exception to the genetic code. Selenoproteins are found, albeit in small numbers, in 55
organisms across the entire tree of life. Recoding of UGA is mediated by RNA structures 56
within selenoprotein transcripts, the SECIS (SElenoCysteine Insertion Sequence) 57
elements. Sec biosynthesis and insertion also require a dedicated system of trans-acting 58
factors that include elements that are common and others that are specific to the three 59
domains of life: bacteria (Kryukov and Gladyshev 2004)(Yoshizawa and Bock 2009), 60
archaea (Rother et al. 2001), and eukaryotes (Squires and Berry 2008)(Allmang et al. 61
2009). 62
The very existence of selenoproteins is puzzling. Sec can apparently be substituted by 63
cysteine (Cys)—as often happens during evolution (Zhang et al. 2006)(Chapple and 64
Guigó 2008)(Mariotti et al. 2012)— with seemingly a small or null impact on protein 65
function. In fact, selenoproteins may be absent in an entire taxonomic group, but present 66
in sister lineages. This can be seen most dramatically within fruit flies: while Drosophila 67
melanogaster and most other flies possess three selenoprotein genes, their relative 68
Drosophila willistoni has replaced Sec with Cys in them, and lost the capacity to 69
synthesize Sec (Chapple and Guigó 2008)(Lobanov et al. 2008). Fungi and plants have 70
also lost this capacity (Lobanov et al. 2009). In other cases, however, such as in 71
Caenorhabditis elegans, the entire pathway is maintained only to synthesize a single 72
selenoprotein (Taskov et al. 2005). It appears that selective pressure exists to maintain 73
Sec, at least in vertebrates, since strong purifying selection across Sec sites that prevent 74
mutations to Cys has been reported (Castellano et al. 2009). Sec encoding has been 75
hypothesized to be an ancestral trait, already present in the early genetic code, since a 76
number of selenoprotein families are shared between prokaryotes and eukaryotes. 77
However, the evolutionary continuity of the Sec recoding systems across domains of life 78
is not certain, since it would require the translocation of the SECIS element (within the 79
coding region in bacteria but within the 3’ UTR in eukaryotes), as well as the radical 80
alteration of its structure. 81
Selenophosphate synthetase (SPS, also called SelD or selenide water dikinase) is 82
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
4
unique among the components of the Sec biosynthesis machinery in that it is often a 83
selenoprotein itself. SPS catalyzes the synthesis of selenophosphate from selenide, ATP 84
and water, producing AMP and inorganic phosphate as products. Selenophosphate is 85
the selenium donor for the synthesis of Sec, which, in contrast to other amino acids, 86
takes place on its own tRNA, tRNAsec (Xu et al. 2007a)(Palioura et al. 2009). 87
SPS proteins are conserved from bacteria to human with about 30% identity and are 88
found in all species known to encode selenoproteins. In prokaryotes, SPS (i.e., SelD) is 89
found also in species where selenophosphate is used to produce selenouridine in tRNAs 90
(SeU). In these species, it acts as the selenium donor to protein ybbB (selenouridine 91
synthase). The presence of the two traits (SeU and Sec) overlaps, but not completely 92
(Romero et al. 2005). In eukaryotes, SPS is generally found as a selenoprotein, while in 93
prokaryotes homologues with Cys aligned to the Sec position are common. As for all 94
selenoproteins, Sec and Cys homologues of SPS are expected to perform the same 95
molecular function, although catalytic efficiency can vary. Indeed, selenophosphate 96
synthesis activity has been demonstrated experimentally for various Sec- and Cys- SPS 97
proteins, as well as for artificial Cys mutants (Kim et al. 1997)(Persson et al. 1997)(Xu et 98
al. 2007a). 99
In vertebrates and insects, two paralogous SPS genes have been reported: SPS2 (i.e., 100
Sephs2), which is a selenoprotein, and SPS1 (i.e., Sephs1), which is not, and carries a 101
threonine (Thr) in vertebrates and an arginine (Arg) in insects in place of Sec (Xu et al. 102
2007b). In contrast to Cys conversion, Thr or Arg conversion in SPS1 seems to result in 103
the abolishment of the selenophosphate synthase function. Indeed, murine SPS1 does 104
not generate selenophosphate in vitro, and does not even consume ATP in a selenium 105
dependent manner (Xu et al. 2007a). Consistently, selenoprotein synthesis is unaffected 106
in a knockout of SPS1 in mouse cell lines (Xu et al. 2007b). Similarly, Drosophila SPS1 107
(i.e., ptuf/SelD) lacks the ability to catalyze selenide-dependent ATP hydrolysis or to 108
complement SelD deficiency in Escherichia coli (Persson et al. 1997). In insects, SPS1 109
is preserved in species that lost selenoproteins (Chapple and Guigó 2008). Although 110
human SPS1 interacts with Sec synthase (SecS) (Small-Howard et al. 2006), these 111
findings, taken as a whole, suggest that SPS1 functions in a pathway unrelated to 112
selenoprotein biosynthesis (Lobanov et al. 2008). What the function of SPS1 may be 113
remains an open question. Human SPS1 has been proposed to function in Sec 114
recycling, since a E. coli SelD mutant can be rescued by SPS1 but only when grown in 115
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
5
the presence of L-selenocysteine (Tamura et al. 2004). In Drosophila, SPS1 has been 116
proposed to be involved in vitamin B6 metabolism (Lee et al. 2011) and in redox 117
homeostasis, since it protects from damage induced by reactive oxygen species (ROS) 118
(Morey et al. 2003). 119
Here we study the evolutionary history of SPS genes across the tree of life. We found 120
that the presence of Sec/Cys SPS genes, together with a few other gene markers, 121
recapitulates well the selenium utilization traits (Sec and SeU) in prokaryotic genomes. 122
Within eukaryotes, specifically within metazoans, we detected a number of SPS 123
homologues with amino acids other than Sec or Cys at the homologous UGA position. 124
We found that Cys- or Sec-containing SPS genes (SPS2) are found in all genomes 125
encoding selenoproteins, while genomes that contain only SPS genes carrying amino 126
acids other than Sec or Cys at the homologous UGA position (SPS1) do not encode 127
selenoproteins. In SPS proteins, thus, it appears that the residue occurring at a single 128
site is a precise marker of function, which can be easily traced in genomes by searching 129
for selenoprotein genes and other markers of selenium utilization. This feature, that may 130
be unique among all protein families, makes SPS genes singularly appropriate to 131
investigate the evolution of gene function. Here, thanks to this feature, we have been 132
able to untangle the complex history of SPS genes with great detail. Our analysis 133
reveals that SPS1 genes in different metazoan lineages (including those of human and 134
fly) originated via parallel duplications from an ancestral Sec-carrying SPS gene. Despite 135
their independent origin, SPS1 genes share similar evolutionary constraints, and have a 136
common function, likely present in the ancestral metazoan SPS2 gene. This indicates 137
selective pressure during metazoan history to segregate different functions to separate 138
loci, and constitutes a remarkable example of recurrent escape from adaptive conflict 139
through gene duplication and subfunctionalization (Hittinger and Carroll, 2007). Within 140
insects, the SPS duplication was followed by the loss of the Sec encoding SPS2 gene in 141
several lineages, which therefore lost the capacity to synthesize selenoproteins 142
(Chapple and Guigó 2008)(Lobanov et al. 2008)—becoming, together with some 143
nematodes (Otero et al. 2014), the only known selenoproteinless metazoans. Strikingly, 144
SPS1 conserved the ancestral UGA codon in selenoproteinless Hymenoptera. Our 145
analyses point out that UGA readthrough in hymenopterans is enhanced by overlapping 146
RNA structures, also present in other selenoproteins. These structures could be related 147
to bacterial SECIS elements, uncovering a possible evolutionary link between the 148
prokaryotic and eukaryotic Sec encoding systems. They would have played a key role 149
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
6
throughout all the evolution of SPS genes, and particularly in the emergence of the 150
SPS1 function in the metazoan ancestral Sec-carrying SPS gene. 151
Results 152
Selenoprotein genes are usually miss-annotated in prokaryotic and eukaryotic genomes, 153
due to the recoding of UGA, normally a stop codon, to Sec. Therefore, we used 154
Selenoprofiles (Mariotti and Guigó 2010), a computational tool dedicated to the 155
prediction of selenoproteins and selenoprotein homologues. We have run Selenoprofiles 156
to search for SelD/SPS genes in all available fully-sequenced eukaryotic and prokaryotic 157
genomes, 505 and 8,263, respectively. We then utilized a combination of approaches to 158
reconstruct their phylogenetic history. Methods and analyses are fully discussed in 159
Supplementary Material S1-S6. 160
SelD as a marker for selenium utilization in prokaryotes 161
Figure 1 shows the distribution of SelD (i.e., prokaryotic SPS) genes in a reference set of 162
223 prokaryotic genomes (Pruitt et al. 2012), along with the presence of other selenium 163
utilization gene markers such as the bacterial selenocysteine synthase gene (SelA). The 164
occurrence of SelD and other Sec machinery components is in good agreement with 165
previous findings (Zhang et al. 2006)(Zhang et al. 2008)(Zhang and Gladyshev 166
2008)(Zhang et al. 2010). Supplementary Material S1 contains details of the genes 167
found in each major lineage investigated, both in the reference set, and in the extended 168
set of 8,263 genomes. 169
SelD genes were found in 26% of the reference prokaryotic genomes. A considerable 170
fraction (19%) of the detected SelD genes encoded a protein with a Sec residue (always 171
in the same position), with all the rest containing Cys instead. The occurrence of SelD is, 172
in general, consistent with the other gene markers for selenium utilization and also with 173
selenoprotein presence (see Figure 1), with only a few exceptions (Supplementary 174
Material S1). The Sec trait (SelD, SelA, tRNAsec, selenoproteins) was found in a slightly 175
larger group of organisms than the SeU trait (SelD, ybbB): 18% vs 16%, respectively. 176
The two traits showed a highly significant overlap: 10% of all species had both (p-value < 177
0.0001, one-tailed Fisher’s exact test). The Sec and SeU markers showed a scattered 178
distribution across the prokaryotic tree, reflecting the dynamic evolution of selenium 179
utilization. The complexity of this phylogenetic pattern is even more evident when 180
considering the extended set of prokaryotic species (see Supplementary Material S1 and 181
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
7
Supplementary Figure SM1.1). 182
In almost every species with SelD (93%), genes for SelA and/or ybbB were identified, 183
supporting the utilization of selenophosphate for Sec and SeU. A notable exception was 184
the Enterococcus genus, where many species including Enterococcus faecalis 185
possessed SelD, but no other markers of selenium usage. This had already been 186
reported as an indicator of a potential third selenium utilization trait (Romero et al. 187
2005)(Zhang et al. 2008). Selenium is in fact used by these species as a cofactor to 188
molybdenum hydroxylases (Haft and Self 2008)(Srivastava et al. 2011). 189
In Pasteurellales, an Order within Gammaproteobacteria, we identified a bona-fide Cys 190
to Sec conversion. Most of Gammaproteobacteria possess a SelD-Cys (or none), and 191
Sec forms are found almost uniquely in Pasteurellales. Phylogenetic sequence signal 192
supports codon conversion rather than horizontal transfer as the cause for SelD-Sec 193
(Supplementary Material S1). Even though cases of conversion of Cys to Sec have been 194
proposed (Zhang et al. 2006), this is the first clearly documented case. Among archaea, 195
SelD was found only in Methanococcales and Methanopyri genomes, whose 196
selenoproteins have been previously characterized (Stock and Rother 2009). The SeU 197
trait was found only in Methanococcales, although with a peculiarity: ybbB is split in two 198
adjacent genes (Su et al. 2012). 199
Previous reports have described a number of SPS genes fused to other genes (Zhang et 200
al. 2008)(da Silva et al. 2013). Thus, we used a computational strategy to identify SelD 201
gene fusions or extensions (see Methods and Supplementary Material S2). Fusions with 202
a NADH dehydrogenase–like domain (Zhang et al. 2008) are by far the most common, 203
and they are found scattered across a wide range of bacteria (Supplementary Figure 204
SM2.1). We also detected two instances of fusions with the NifS-like protein (Cys 205
sulfinate desulfinase, proteins that deliver selenium for the synthesis of selenophosphate 206
by SPS2, (Lacourciere et al. 2000)). In all cases, the extension/fusion is on the N-207
terminal side of the SPS genes, and these are always SelD-Cys with the single 208
exception of NifS-SPS in Geobacter sp. FRC-32, which contains Sec. Since we found 209
selenoproteins and other Sec machinery genes in all these genomes, we predict that 210
generally these extended SPS genes have retained the original selenophosphate 211
biosynthetic activity. 212
SPS2 as a marker for Sec utilization in eukaryotes 213
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
8
Figure 2 shows SPS genes and predicted selenoproteins found in a representative set of 214
eukaryotic genomes. The presence of SPS2 genes (defined as those with Sec, or Cys) 215
correlates perfectly with the presence of selenoproteins. Thus, SPS2 is a marker of Sec 216
utilization in eukaryotes. Our results replicate and substantially expand previous surveys 217
of selenoproteins in eukaryotic genomes (e.g. (Lobanov et al. 2007)(Chapple and Guigó 218
2008)(Lobanov et al. 2009)(Jiang et al. 2012)). 219
Overall, the Sec trait exhibits a rather scattered distribution in protists reflecting a 220
dynamic evolution similar to bacteria. We found SPS2 (and therefore selenoproteins) 221
scattered across Stramenopiles, Alveolata, Amoebozoa and other protist lineages, 222
presumably due to multiple independent events of selenoprotein extinction. In contrast, 223
we found selenoproteins in all investigated Kinetoplastida (Euglenozoa), including the 224
parasites Trypanosoma and Leishmania (Lobanov et al. 2006)(Cassago et al. 2006). We 225
did not find any bona fide SPS2 nor selenoproteins in Fungi or land plants 226
(Embryophyta), despite many genomes were searched (284 and 41, respectively, a 227
subset of which are shown in Figure 2). In contrast, green algae genomes contain large 228
numbers of selenoproteins as previously reported (Novoselov et al. 2002)(Palenik et al. 229
2007). The largest number was in the pelagophyte algae Aureococcus anophagefferens 230
(heterokont, Stramenopiles), known for its rich selenoproteome (Gobler et al. 2013). All 231
metazoans encode SPS2 and selenoproteins, with exceptions detected so far only in 232
some insects (Chapple and Guigó 2008)(Lobanov et al. 2008) and some nematodes 233
(Otero et al. 2014). 234
As in prokaryotes, we found a few protist genomes in which SPS2 is fused to other 235
genes. Fusions with a NADH dehydrogenase–like domain are also the most common, 236
and scattered among many taxa (Figure 2). We detected NifS-SPS fusions in the 237
amoeba Acanthamoeba castellani and the heterolobosean Naegleria gruberi. In these 238
two genomes, we found additional SPS2 candidates (Supplementary Material S2). In N. 239
gruberi, SPS2 is fused to a polypeptide containing a methyltransferase domain (da Silva 240
et al. 2013). Finally, all SPS2 proteins in Plasmodium species were found to possess a 241
large polypeptide extension (>500 amino acids). This domain shows no homology with 242
any known protein, and its function remains unknown. We did not find convincing SPS 243
fusions in non-protist eukaryotes (see Supplementary Material S2). 244
The reconstructed gene tree of the bacterial, archaeal and eukaryotic SPS sequences 245
follows broadly their known phylogenetic relationships (Supplementary Figure SM3.1), 246
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
9
and supports the continuity of the selenoprotein system across the three domains of life. 247
Most likely, thus, the last common ancestor of eukaryotes and prokaryotes possessed 248
selenoproteins and SPS, probably as a selenoprotein itself. The continuity in SPS 249
phylogenetic signal is apparently broken only in few protist lineages, which seem to have 250
acquired a bacterial-like SPS gene by horizontal transfer (see Supplementary Material 251
S3). All the gene fusions shared by protists and bacteria (NADH-SPS, NifS-SPS) are 252
explained by this process (i.e., they are not independently evolved fusions but the result 253
of a fusion event in bacteria which was subsequently transmitted horizontally). 254
Independent duplications of SPS2 generates SPS1 proteins in 255
metazoans 256
In many metazoan lineages we detected additional SPS genes, which are neither 257
selenoproteins nor Cys-homologues. Since, within metazoans, SPS-Cys are found only 258
in nematodes, and, outside metazoans, additional SPS genes are absent (Figure 2), we 259
argue that the last common metazoan ancestor possessed a single SPS gene with Sec 260
(i.e., it was SPS2). Our results show that the additional SPS genes do not have a single 261
origin, but instead they were generated by independent duplications of SPS2 in a 262
number of metazoan lineages (Supplementary Material S3). We have specifically 263
identified four independent duplications (Figures 3 and 4). One SPS duplication occurred 264
at the root of the vertebrates, probably as part of one of the reported rounds of whole 265
genome duplications (Dehal and Boore, 2005). Another duplication occurred within 266
tunicates, likely originated by retro-transposition of an alternative isoform of SPS2. 267
Another duplication occurred within annelids, at the root of the Clitellata lineage. Finally, 268
a duplication occurred at the root of insects. In each of these duplications, a specific 269
substitution of the Sec residue was fixed: threonine in vertebrates, glycine in tunicates, 270
and leucine in annelids. In insects, however, the UGA codon was maintained after 271
duplication, and substituted in some lineages by arginine. There were at least two 272
independent UGA to arginine substitutions in Paraneoptera and Endopterygota. 273
Remarkably, insects without selenoproteins lost the original SPS2 gene, but maintained 274
the duplicated copy. 275
Therefore, as a universal trend, Cys- or Sec- containing SPS genes (to which we refer 276
as SPS2) are found in all metazoan (eukaryotic) genomes encoding selenoproteins, 277
while eukaryotic genomes containing only SPS genes carrying amino acids other than 278
Sec or Cys at the homologous UGA position do not encode selenoproteins. Thus, the 279
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
10
duplicated non-Cys, non-Sec copies of SPS2 in human and fly are unlikely to carry a 280
function related to selenoprotein synthesis. Since in human and fly, they are commonly 281
referred to as SPS1 (Xu et al. 2007b), we will collectively refer to all non-Cys, non-Sec 282
SPS2 duplications in metazoans as SPS1 (e.g. SPS1-Thr for human SPS1). 283
In the next section, we briefly describe each SPS duplication separately. 284
SPS phylogeny in vertebrates 285
All non-vertebrate deuterostomes (except tunicates) and Cyclostomata (jawless 286
vertebrates, such as lampreys) encode only one SPS2 gene, while all Gnasthostomata 287
possess SPS1 in addition. Supported also by a strong phylogenetic signal 288
(Supplementary Figure SM3.2), we conclude that vertebrate SPS1 (SPS1-Thr) 289
originated from a duplication of SPS2 concomitant with conversion of Sec to threonine at 290
the root of Gnasthostomata (Supplementary Material S3). The conservation of intron 291
positions within the protein sequence is consistent with the duplication involving the 292
whole gene structure, and given its phylogenetic position, this may have occurred 293
through one of the reported rounds of whole genome duplications at the base of 294
vertebrates (Dehal and Boore, 2005). As recently reported (Mariotti et al. 2012), in 295
mammals the SPS2 gene duplicated again, this time by retro-transposition, generating a 296
second SPS2-Sec gene almost identical to the parental, except for the lack of introns. In 297
placental mammals, the intronless SPS2 replaced functionally the parental gene (which 298
was lost), while non-placental mammals still retain the two copies (see, for example, 299
Monodelphis domestica in Figure 2). 300
SPS phylogeny in tunicates 301
Tunicates are the closest outgroup to vertebrates (Delsuc et al. 2006), with ascidians 302
(sea squirts) constituting the best-studied and most sequenced lineage. In the ascidian 303
Ciona, we identified a single SPS gene. This appears to be the direct descendant of the 304
ancestral metazoan SPS2, and possesses a SECIS element in the 3’UTR (necessary for 305
the incorporation of Sec). Nonetheless, this gene produces two different protein 306
isoforms, deriving from alternative exon structures at the 5’ end (SPS-ae; Supplementary 307
Material S4). One isoform carries Sec (SPS-Sec, corresponding to the SPS2), while the 308
other one, previously unreported, has a glycine instead (SPS-Gly, corresponding to 309
SPS1). We mapped the origin of the SPS-Gly isoform to the root of ascidians, since the 310
non-ascidian tunicate Oikopleura dioica, appears to have only SPS2-Sec gene with a 311
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
11
single isoform, while both isoforms (SPS-Sec and SPS-Gly) are found in the ascidian 312
Molgula tectiformis (see Figure 5). We also found both forms in the recently sequenced 313
ascidian species Botryllus schlosseri (Voskoboynik et al. 2013) and Halocynthia roretzi, 314
belonging to the sister lineages of Styelidae and Pyuridae, respectively. However, in 315
these species the two forms mapped to distinct genomic loci, and they correspond 316
therefore to two different genes (see Supplementary Material S4). SPS-Sec is intronless, 317
and contains a SECIS within the 3’UTR. It corresponds, thus, to SPS2. SPS-Gly 318
possesses instead the ancestral intron structure (very similar to O.dioica SPS2), and has 319
no SECIS. It corresponds, therefore, to SPS1. Most likely, the ancestral SPS-sec 320
alternative transcript isoform retro-transposed to the genome at the root of Styelidae and 321
Pyuridae. This generated a copy that soon replaced functionally the SPS-Sec isoform of 322
the parental gene, which as a result specialized in the production only of the SPS-Gly 323
isoform, as both the Sec coding exon and the SECIS element degenerated. This 324
exemplifies an evolutionary scenario, not frequently reported in the literature, in which 325
alternative transcripts precede gene duplication, providing a possible intermediary step 326
of how a dual-function protein can escape from adaptive conflict (Hittinger and Carroll 327
2007). 328
SPS phylogeny in insects 329
Insects provide a unique framework to study selenoprotein evolution. They have 330
undergone several waves of complete selenoprotein extinction, in which selenoprotein 331
genes were converted to Cys homologues or lost, and the Sec machinery degenerated 332
and/or disappeared. This process occurred in several lineages independently: 333
Hymenoptera, Lepidoptera, Coleoptera (or at least Tribolium castaneum), in the 334
Drosophila willistoni lineage (Chapple and Guigó 2008)(Lobanov et al. 2008) within 335
Endopterygota, and also in the paraneopteran pea aphid (Acyrthosiphon pisum (The 336
International Aphid Genomics Consortium 2010)). Consistent with its function, the SPS-337
Sec (SPS2) gene is present in every insect genome coding for selenoproteins, and it is 338
absent in every insect genome not coding for selenoproteins (Figure 2). SPS1 genes, in 339
contrast, are present in all insect genomes. Most insect SPS1 genes (in Lepidoptera, 340
Coleoptera, Diptera, etc) use arginine at the Sec/Cys site (SPS1-Arg, Figures 3 and 4). 341
Instead, SPS1 has a UGA codon at this position in Hymenoptera, and in two non-342
monophyletic species within paraneopterans: Rhodnius prolixus and Pediculus 343
humanus. In these, however, and in contrast to hymenopterans, we found an additional 344
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
12
SPS-Sec gene with SECIS element (SPS2), as well as, consistently, a number of other 345
selenoproteins and the complete Sec machinery. From all these data (see also 346
Supplementary Material S3), we hypothesize (Figure 3) that all insect SPS1 genes 347
derive from the same SPS2 duplication event occurred approximately at the root of 348
insects, initially generating a UGA-containing, SECIS lacking gene (SPS1-UGA). In most 349
lineages the new gene switched the UGA codon to arginine generating SPS1-Arg 350
proteins. This occurred at least twice independently, in the pea aphid and in the last 351
common ancestor of Coleoptera, Lepidoptera, and Diptera. In Hymenoptera and most 352
Paraneoptera, the gene is still conserved with UGA and no SECIS. The original SPS2 353
was lost in all lineages where selenoproteins disappeared. 354
SPS1-UGA: non-Sec readthrough 355
The strong conservation of the UGA codon in hymenopteran/paraneopteran SPS1-UGA 356
aligned exactly at the position of the SPS2 Sec-UGA codon is extremely puzzling. SPS1-357
UGA does not contain a SECIS element. Furthermore, Hymenoptera lack most 358
constituents of the Sec machinery, and these organisms cannot synthesize 359
selenoproteins. However, the striking conservation of the insect SPS1-UGA sequence 360
strongly indicates that it is translated and functional. Previously, we had hypothesized 361
that SPS1-UGA could perhaps be translated by a readthrough mechanism not involving 362
Sec insertion (Chapple and Guigó 2008). In this respect, there is growing evidence for 363
abundant stop codon readthrough in insects, with UGA being the most frequently 364
observed readthrough codon in Drosophila (Jungreis et al. 2011). 365
Here, we have found additional strong evidence in support of the translational recoding 366
of SPS1-UGA. First, all ten recently sequenced hymenopteran genomes show a clear 367
pattern of protein coding conservation across the UGA, resulting in a readthrough 368
protein of an approximate size of SPS. 369
Second, we found the hexanucleotide GGG-UG[C/U], which is highly overrepresented 370
next to known viral “leaky” UAG stop codons (Harrell et al. 2002), to be ultraconserved 371
subsequent to the UGA in SPS1-UGA genes. While the hexanucleotide is found 372
scattered in some other metazoan SPS2 sequences—where it could actually contribute 373
to UGA translation—it is absent from all insect SPS2 genes (Figure 6). Moreover, the 374
hexanucleotide is also absent from insect SPS1 genes having Arg instead of UGA. 375
Third, SPS1-UGA contains a conserved secondary structure overlapping the UGA (and 376
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
13
therefore the hexanucleotide). We will name this structure HRE, for hymenopteran 377
readthrough element. For its location and structure (Supplementary Material S6, 378
Supplementary Figure SM6.2), HRE appears to derive from stem-loop structures called 379
SRE (from Sec redefinition elements), previously identified in many selenoprotein genes 380
including SPS2 (Howard et al. 2005), and that promote readthrough activity. Indeed, the 381
SRE element of human Selenoprotein N gene (SepN1) was functionally characterized 382
(Howard et al. 2005)(Howard et al. 2007), showing that it promotes recoding of UGA 383
codons. In the presence of a downstream SECIS element, Sec insertion is enhanced. In 384
its absence, a Sec-independent readthrough activity is still observed. We hypothesize 385
that HRE of SPS1-UGA has a similar activity. In further support of this, we used RNAz 386
(Gruber et al. 2010) to characterize the secondary structures embedded in the coding 387
sequence of all SPS genes (see Methods and Supplementary Material S6). In 388
prokaryotes, this yielded the bacterial SECIS of the Sec containing SelD genes 389
(Supplementary Figure SM6.1). In eukaryotes, we obtained stable stem loops (SRE) in 390
the same region of all UGA-containing SPS genes (Supplementary Figure SM6.2). 391
Strikingly, the largest and most stable structures were in SPS1-UGA, where we 392
predicted HREs as a 3 stem clover-like structure with the UGA in the apex of the middle 393
stem. 394
Overall, these results strongly suggest that the insect SPS1-UGA gene is translated. 395
Furthermore, a recent proteomics study in the hymenopteran Cardiocondyla obscurior 396
(Fuessl et al. 2014) yielded a few peptides mapping to the SPS1-UGA gene—albeit not 397
to the region including the UGA codon. Thus, we cannot unequivocally identify the amino 398
acid that is inserted in response to the UGA codon. Given that we observed two 399
independent UGA to Arg substitutions within insects, Arg could be a potential candidate. 400
Nonetheless, the recognition of a UGA codon by a standard tRNA for Arg would require 401
at best one mismatch in the first position of the codon (third position of 402
the anticodon), which is expected to compromise translation. 403
Functional hypothesis: parallel subfunctionalization generates SPS1 404
proteins 405
Even though originating from independent gene duplications of the same orthologous 406
gene (Figures 3 and 4), the pattern of strong sequence conservation suggests that SPS1 407
genes share a common function. It has been demonstrated for both insects and 408
vertebrate SPS1 that their function is different from SPS2 (Persson et al. 1997)(Xu et al. 409
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
14
2007a). We therefore suggest that the ancestral SPS2 protein at the root of metazoans 410
had not only its catalytic activity (i.e., synthesis of selenophosphate from selenide), but 411
also an additional, unknown function. Eventually, several metazoan lineages split these 412
two, with a new, duplicated protein, SPS1, assuming this other function. If this 413
hypothesis is true, then SPS1 proteins (although paraphyletic) should have similar 414
functions. To test this hypothesis we designed rescue experiments in Drosophila 415
melanogaster. The SPS1 mutation (SelDptuf ) is lethal in homozygous larvae and results 416
in very reduced and aberrant imaginal disc epithelia (Figure 7A; Alsina et al. 1998). 417
Thus, we used the Gal4-UAS system to activate different metazoan SPS1 genes in 418
SelDptuf mutants and tested whether the imaginal disc phenotype could be reverted. We 419
drove expression of either the human SPS1-Thr, the SPS-Gly isoform from Ciona 420
intestinalis SPS-ae, or the SPS1-UGA from Atta cephalotes (ant, hymenopteran), cloned 421
downstream UAS sequences, using the ubiquitous armadillo-Gal4 driver (Supplementary 422
Material S7, Supplementary Figure SM7.1). We focused on the wing imaginal disc and 423
observed significant rescue using the SPS-Gly isoform from C.intestinalis, both in size 424
and shape (Figure 7C) and partial rescue in size when using the ant SPS1-UGA or the 425
human SPS1-Thr (Figures 7D, E). These experiments suggest that the tested 426
heterologous SPS1 proteins have a similar molecular function, which is, as previously 427
noted, distinct from selenophosphate synthesis. 428
Unequal selective pressure on SPS1 and SPS2 genes after 429
duplication 430
We observed substantial differences in the rate of non-synonymous vs synonymous 431
substitution (Ka/Ks) across metazoan SPS genes, between the branches corresponding 432
to SPS1 and SPS2 after duplication (see Supplementary Material S5). Thanks to the 433
large number of SPS sequences available, we could reliably quantify Ka/Ks for SPS1 and 434
SPS2 in vertebrates and insects, and also for their closest outgroups which did not 435
duplicate SPS (Supplementary Figures SM5.1 and SM5.2). For annelids and ascidians, 436
with fewer sequences available, we obtained less accurate estimates (Supplementary 437
Figure SM5.3). After duplication, SPS2 genes display much higher values of Ka/Ks than 438
SPS1. The values for the unduplicated SPS2 genes are very similar to SPS1, and thus 439
lower than SPS2 after duplication. Although to different degrees, the trend is observed 440
for all SPS duplications in metazoans. It is evident both when comparing the extant 441
SPS1 and SPS2 sequences with the predicted ancestral sequence before duplication, 442
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
15
and when comparing extant sequences within the same orthologous group 443
(Supplementary Material SM5). This results in higher sequence divergence of the 444
duplicated SPS2 proteins with respect to SPS1, and also to SPS2 prior to duplication. 445
Indeed, SPS1 genes show overall higher protein sequence similarity across metazoans 446
than SPS2 genes after duplication (e.g. 80% vs 70% for the sequence set in 447
Supplementary Figure SM5.3). Moreover, the extant unduplicated SPS2 genes exhibit 448
stronger sequence similarity to SPS1 genes, than to SPS2 genes after duplication. The 449
diverse rate of protein evolution after duplication posed a major challenge for 450
phylogenetic reconstruction, resulting in an artifact known as long branch attraction (see 451
Supplementary Material S3). The Ka/Ks difference in SPS1 vs SPS2 is particularly 452
pronounced for insects, likely connected to the documented degeneration of the Sec trait 453
in this lineage (Chapple and Guigó 2008). 454
455
Discussion 456
Gene duplications are generally assumed to be the major evolutionary forces generating 457
new biological functions (Lynch and Conery 2000). However, the mechanisms by means 458
of which duplicated gene copies are maintained and evolve new functions are still poorly 459
understood (Innan and Kondrashov 2010). First, the evolutionary history of genes is 460
inherently difficult to reconstruct, since numerous factors deteriorate and confound 461
phylogenetic signals (Philippe et al. 2011). Second, assigning functions to genes is 462
difficult. Even the very same concept of function is controversial, and there is no 463
universal definition of what constitutes function (Kellis et al. 2014). From a pragmatic 464
standpoint, function is often equated to some sort of biochemical activity, which, in turn, 465
is intimately connected to specific experiments to measure it. But this fails to grasp that 466
genes work in a concerted way within a given biological context, and thus, even similar 467
biochemical activities may result in very different biological functions (phenotypes). Also, 468
experimental validation of function is possible only for a few genes, while for the great 469
majority, function is solely assigned by inference based on sequence similarity. 470
However, there are no universal thresholds to discriminate different functions from levels 471
of sequence divergence, since cases of exist of conserved function with low sequence 472
similarity and conversely, different functions can be carried out by highly similar 473
sequences. For all these reasons, functional evolution of genes is particularly difficult to 474
trace. 475
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
16
Here, we have investigated the evolutionary history of selenophosphate synthetase 476
genes (SelD/SPS) along the entire tree of life. SPS constitutes a particularly appropriate 477
gene family to investigate evolution of function. Since it is required for selenoprotein 478
synthesis, we can monitor its function as a genomic phenotype, by inspecting genomes 479
for selenoproteins and for other gene markers of selenium utilization. Moreover, SPS is 480
unique among all genes in that it is part of the Sec machinery and it is often a 481
selenoprotein itself. As a selenoenzyme, we presume that most likely its UGA-encoded 482
Sec residue can be functionally replaced only with Cys, and we thus expect that 483
mutations to codons other than Cys would impair its selenophosphate synthetase 484
function. 485
Consistent with this, we have found that Cys- or Sec- containing SPS genes (which we 486
refer as SPS2) are found in all genomes encoding selenoproteins, while those metazoan 487
genomes containing only SPS genes carrying amino acids other than Sec or Cys at the 488
homologous UGA position, do not encode selenoproteins. Our results suggest that the 489
non-Cys, non-Sec SPS genes share a common function, likely unrelated to 490
selenophosphate synthesis, and we refer collectively to them as SPS1. In SPS genes, 491
thus, the amino acid inserted at this single codon position seems to determine 492
unequivocally protein function, and is intertwined to the evolutionary fate of an entire 493
class of proteins (selenoproteins). Thanks to this feature, maybe unique among all 494
protein families, we have been able to untangle the complex functional evolution of SPS 495
genes, in particular within metazoans, where it involves a series of independent gene 496
duplications and the associated subfunctionalization events, from an ancestral Sec 497
carrying SPS gene that had both the SPS1 and SPS2 functions. In some ascidians, the 498
duplication originated by selective retro transposition of the SPS2 isoform of a single 499
gene encoding both SPS1 and SPS2 isoforms. Alternative transcript isoforms and gene 500
duplication are considered the main mechanisms contributing to increase protein 501
diversity. They anticorrelate at the genomic scale, so that protein families with many 502
paralogues tend to have fewer transcript variants, and vice versa (Talavera et al. 2007). 503
It is therefore expected that some genes shifted from one strategy to the other during 504
evolution: the function of an alternative isoform in one species would be carried by a 505
duplicated copy of the gene in another species. However, other than the ascidian SPS 506
genes reported here, only a handful of cases have been so far reported: the eukaryotic 507
splicing factor U2AF1 in vertebrates (Pacheco et al. 2004), the ey gene in Drosophila 508
(also known as Pax6)(Dominguez et al. 2004) and the mitf gene in fishes (Altschmied et 509
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
17
al. 2002). Very recently, Lambert et al. (2015) have suggested that this phenomenon 510
could be widespread during vertebrate evolution. 511
Gene duplication and alternative transcription differ in the efficiency with which they 512
separate functions. Duplications fully segregate the functions to different loci—allowing 513
both independent sequence evolution and independent regulation. In contrast, 514
alternative transcription/splicing isoforms typically share regions of identical sequence, 515
as well as, generally, proximal and distal regulatory regions. Thus, when a single gene 516
carries a dual function (even if exerted through alternative isoforms), certain sequence 517
segments are under two simultaneous sources of selection, which may be in 518
competition. Gene duplication and subfunctionalization provides a way to entirely escape 519
from the adaptive conflict (Hittinger and Carroll 2007). The fact that convergent 520
subfunctionalization duplications occurred several times independently within metazoans 521
is suggestive of selective advantage in having the two SPS functions carried out by 522
different genes. Here, we have detected duplications occurring at the root of vertebrates, 523
within tunicates, within annelids and at the root of insects. However the metazoan 524
genome space remains largely unexplored and other duplications are likely to be 525
uncovered in the future, as additional genome sequences become available. 526
Overall, the history of metazoan SPS genes contributes a striking example how function 527
evolves across orthologues and paralogues through a complex pattern of duplication 528
and loss events (Gabaldón and Koonin 2013). Specifically, it constitutes a prototypical 529
case of a particular “functional evolution path”: an ancestral state of dual/redundant 530
function, followed by subfunctionalization through gene duplication which occurs 531
independently in parallel lineages of descent. Analogous cases have been previously 532
reported (e.g. the β-catenin/armadillo gene during insect evolution (Bao et al. 2012)). 533
Parallel gene duplications result in a complex pattern of orthology/paralogy relationship 534
(Gabaldón 2008). Within monophyletic lineages with both SPS1 and SPS2 originated by 535
a duplication event (e.g. vertebrates), these genes are always paralogous, but both 536
genes are phylogenetically co-orthologous to SPS2 in lineages with a single SPS (e.g. 537
flatworms). Also, these genes are phylogenetically co-orthologous to both SPS1 and 538
SPS2 in other lineages with a distinct, parallel duplication event (e.g. insects). 539
Nonetheless, if we assume that subfunctionalization occurred in the same way in the 540
different lineages, then the SPS1 genes generated independently can be considered 541
functional orthologous, despite the lack of direct phylogenetic descent. In this view, it 542
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
18
may appear counter-intuitive that SPS1 genes have replaced the Sec/Cys site with 543
lineage-specific amino acids which are so dissimilar. This suggests that the residue 544
occurring at this particular position is irrelevant to SPS1 function. 545
After each duplication event documented here, the SPS2 gene increased the rate of 546
non-synonymous vs synonymous substitution (Ka/Ks). This can be interpreted as a 547
relaxation of the selective constraints acting on the coding sequence. In contrast, SPS1 548
genes preserve lower values of Ka/Ks, comparable to the parental gene before 549
duplication (Supplementary Material S5). It is remarkable that this trend is observed 550
consistently throughout all metazoan SPS duplications, despite the vast diversity in 551
duplication mechanisms. We must conclude that this evolutionary pattern is intimately 552
connected to the functions of the two SPS copies after duplication. Intuitively, the 553
relaxation of constraints in the SPS2 gene after duplication is well consistent with the 554
scenario of subfunctionalization that we propose. However, we may have expected the 555
duplicated SPS1 genes to also show relaxation compared to the parental gene before 556
duplication, since the dual-function state has presumably the highest selective 557
constraint. In contrast, the constraints acting on SPS1 after duplication are very similar 558
to those before duplication. These observations may suggest that the selective 559
constraints acting on the ancestral, dual-function SPS2 gene are imputable mainly to 560
selection for the SPS1 function, rather than for the canonical selenophosphate 561
synthetase activity. 562
Our results suggest that readthrough of the SPS UGA codon to incorporate amino acids 563
other than Sec played an important role in metazoan SPS duplication and 564
subfunctionalization. Indeed, we hypothesize that simultaneous to the production of 565
canonical Sec coding transcripts, selenoprotein genes (including SPS2) could produce 566
truncated SECIS-lacking mRNAs, due to inefficient transcription, or through a regulated 567
process, such as alternative usage of 3’ exons or poly-adenylation sites. In fact, 568
translational regulation is known to play an important role for many selenoproteins 569
(Howard et al. 2013). In at least one case (Selenoprotein S, VIMP gene), this is achieved 570
by the regulated inclusion/exclusion of the SECIS element in the mature transcripts 571
through alternative splicing (Bubenik et al. 2013). In such SECIS-lacking truncated 572
transcripts, no Sec insertion will take place, and termination of translation at the UGA 573
codon is the most likely outcome. However, if alternative non-SECIS mediated 574
readthrough mechanisms are present, other amino acids could still be incorporated in 575
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
19
response to the UGA codon. If this amino acid is other than Sec or Cys, the resulting 576
protein may not be able to perform its original function (as in SPS2/SPS1), but it may 577
develop a novel one. We speculate that this was the case for the ancestral metazoan 578
SPS gene, in which, possibly, the novel SPS1 function emerged in a secondary, non-579
Sec isoform maybe produced through regulated SECIS inclusion. The readthrough 580
enhancing stem loop structures around the UGA codon (SRE and HRE) found in SPS 581
genes support this hypothesis. These structures are particularly strong in the SECIS-582
lacking hymenopteran SPS1-UGA genes, which, in addition, contain the readthrough 583
enhancing hexanucleotide. Then, when the SPS gene duplicates, the SECIS is lost in 584
the SPS1 copy, leading to complete subfunctionalization. The whole process is best 585
seen within insects. Hymenoptera, after the SPS duplication, lost SPS2 when 586
selenoproteins disappeared from the genome, but still conserved SPS1-UGA. 587
Paraneopterans with selenoproteins (R.prolixus and P.humanus) still maintain the two 588
genes (SPS1 and SPS2) with UGA. In pea aphid and in non-hymenopteran 589
Endopterygota, the in-frame UGA in SPS1-UGA mutated to an Arg codon, becoming the 590
“standard” gene known as Drosophila SPS1. 591
It is tempting to speculate that the SRE/HRE stem loop structures, which are found not 592
only in SPS, but also in some other selenoprotein genes (Howard et al. 2005), are the 593
eukaryotic homologues of the bacterial SECIS element (bSECIS, see Supplementary 594
Figure SM6.1 and SM6.2). Indeed, the bacterial Sec insertion system is different from its 595
eukaryotic counterpart, both regarding the structure and the localization of the SECIS 596
element. In eukaryotes, the SECIS is characterized by a kink-turn core, and it is located 597
in the 3’ UTR. In bacteria, the bSECIS it is a simple stem-loop structure (lacking the kink-598
turn motif), located immediately downstream of the Sec-UGA within the coding 599
sequence. bSECIS elements are read by SelB, a Sec-specific elongation factor with a 600
specialized N-terminal domain. In eukaryotes, SECIS elements are bound by SECIS 601
binding protein 2 (SBP2), which recognizes specific structural features mainly around its 602
kink-turn core (Krol 2002). 603
The evolutionary history of the SECIS elements across the domains of life remains 604
largely unexplored. The assumption that SECIS and bSECIS are phylogenetically 605
homologous structures requires the re-location of the bSECIS to the 3’UTR, concomitant 606
with the radical alteration of its structure—for both of which it is difficult to postulate 607
plausible evolutionary mechanisms. In contrast, SRE/HRE are stem loop structures that 608
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
20
localize next to the UGA codon, and resemble much more the bSECIS structure than the 609
SECIS does. In addition to structural similarity, bSECIS and HRE/SRE also share 610
functional similarity, since both disfavor termination at Sec-UGA sites during translation. 611
Thus, we hypothesize that the SRE/HRE structures (at least those in SPS genes) are 612
derived from bSECIS, and that the eukaryotic SECIS is an evolutionary innovation 613
unconnected to the bSECIS. After the emergence of the eukaryotic SECIS system, the 614
ancestral bSECIS function was ”downgraded” to helper for Sec insertion (SRE). In 615
ancestral metazoans it was kept under selection to allow both Sec-insertion and non-Sec 616
readthrough, as the non-Sec isoform acquired the SPS1 function. Within insects, after 617
the SPS duplication, the ancestral bSECIS structure remained in one of the duplicated 618
copies (SPS1-UGA), specializing only in non-Sec readthrough. This structure has been 619
conserved in hymenopterans and some paraneopterans, becoming what we named here 620
HRE. 621
In summary, thanks to the singular feature that the amino acid occurring at a single 622
position serves as binary indicator of gene function, we traced the evolutionary history of 623
SelD/SPS genes with unprecedented detail, providing, at the same time, a survey of 624
selenium utilization traits across the entire tree of life. In metazoans, the SPS phylogeny 625
constitutes a prototypical case of functional evolution, in which dual function is 626
segregated to different loci through independent gene duplications and subsequent 627
convergent subfunctionalization. 628
629
Methods 630
Gene prediction 631
We performed gene prediction using Selenoprofiles ver. 3.0 (Mariotti and Guigó 2010) 632
(http://big.crg.cat/services/selenoprofiles). This program scans nucleotide sequences to 633
predict genes belonging to given protein families, provided as amino acid sequence 634
alignments (profiles). When searching for a selenoprotein family (at least one sequence 635
with a Sec residue), the program can find both selenoprotein genes and homologues 636
with other amino acids at this position, as long as their full protein sequence is similar 637
enough to the input profile alignment. To ensure maximum sensitivity, two manually 638
curated profiles were used for SPS, one containing sequences from all lineages and 639
another only from prokaryotes. Specificity was controlled through a set of filters applied 640
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
21
to predicted candidate genes, which include checking similarity to the profile sequences, 641
and best matches among all known proteins in the NCBI non-redundant (NR) database 642
(AWSI score and tag score – see Selenoprofiles manual). We built our SPS gene 643
datasets (available in Supplementary Material S7) searching a large collection of 644
eukaryotic (505) and prokaryotic genomes (8.263 with a non redundant reference subset 645
of 223), downloaded mainly from NCBI. For eukaryotes and for prokaryotes in the 646
reference set, results were manually inspected and filtered to exclude duplicates, 647
pseudogenes (abundant in vertebrates) and contaminations of the genome assemblies. 648
Eukaryotic SECIS elements were searched using the program SECISearch3 (Mariotti et 649
al. 2013). 650
We also used Selenoprofiles with profiles derived from protein families that are markers 651
for other selenium utilization traits, ybbB and SelA. We used the same program with a 652
comprehensive collection of selenoprotein families in a semi-automatic procedure to 653
probe the number of selenoproteins per lineage (as those displayed in Figures 1 and 2). 654
tRNAscan-SE ver. 1.23 (Lowe and Eddy 1997) and Aragorn ver. 1.2.28 (Laslett and 655
Canback 2004) were used to search for tRNAsec. We noticed the presence of abundant 656
false positives in prokaryotes, lacking the long extra-arm characteristic of tRNAsec. 657
Thus, we focused most of our analysis on the reference set, inspecting manually 658
candidates and filtering out all such cases. 659
For ciliates all predictions were manually adjusted, given their non-standard genetic 660
code. In addition to genomes, the NCBI EST database was also used to investigate 661
certain eukaryotic lineages of interest, such as tunicates, annelids and birds 662
(Supplementary Materials S3 and S4). 663
Phylogenetic analysis 664
Alignments were computed using T-Coffee ver. 8.95 (Notredame et al. 2000) and 665
sometimes complemented by MAFFT ver. 7.017b (Katoh et al. 2005). To deduce the 666
phylogenetic history of SPS, we combined three types of information: topology of gene 667
trees reconstructed from protein sequences, phylogenetic tree of investigated species, 668
and positions of introns in respect to protein sequence. Gene trees were computed by 669
maximum likelihood as explained in (Mariotti et al. 2012) after (Huerta-Cepas et al. 670
2011), and visualized using the program ETE ver. 2 (Huerta-Cepas et al. 2010). The 671
approximate phylogenetic tree of investigated species was derived from the NCBI 672
taxonomy database (Sayers et al. 2009), and was refined for insects with data from The 673
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
22
International Aphid Genomics Consortium (2010). Figures 1, 2 and Supplementary 674
Figure SM1.1 were generated with the R package ggsunburst, available at 675
http://genome.crg.es/~didac/ggsunburst/. Relative positions of introns were visualized 676
using selenoprofiles_tree_drawer (available within Selenoprofiles) and ETE. 677
The SPS phylogenetic history presented here has been deduced using parsimony as the 678
main driving principle. Supplementary Material S3 contains a detailed description of the 679
process to solve the phylogeny of eukaryotic SPS. Supplementary Material S4 is 680
dedicated to SPS genes in tunicates. 681
Evolutionary analysis 682
We analyzed the ratio of non-synonymous vs synonymous substitution rates (Ka/Ks) in 683
metazoan SPS genes using the program Pycodeml (Mariotti, unpublished) available in 684
Supplementary Material S8, or online at https://github.com/marco-mariotti/pycodeml. 685
This program runs internally CodeML, a program which is part of the PAML package 686
(Yang 2007) to predict the sequence at ancestral nodes. Then, it computes various 687
indexes of sequence evolution related to Ka/Ks, fully explained in Supplementary Material 688
S5. Pycodeml can also produce a graphical output including the sequence alignment, 689
allowing to inspect in detail any substitution (available at http://big.crg.cat/SPS). We 690
have run Pycodeml on three manually curated alignments of coding sequences: one for 691
the insect SPS duplication (Supplementary Figure SM5.1), one for the vertebrate SPS 692
duplication (Supplementary Figure SM5.2), and one “summary set” containing 693
representatives for all four SPS duplications here described (Supplementary Figure 694
SM5.3). Each alignment was trimmed manually to leave out the N-terminal and C-695
terminal tails, and to remove any insertion that occurs only in any single sequence. 696
Supplementary Material S5 contains the results of the evolutionary analysis, and an 697
explanation of all sequence statistics applied. 698
Detection of extensions and fusions 699
We used two different strategies to detect additional domains in SPS genes. First, we 700
searched for annotated SPS fusions. We ran our SPS gene set with BLASTp (Altschul et 701
al. 1997) against the NCBI NR database. Then we parsed the results, searching for 702
large stretches of sequence of a matched NR protein that were not included in the output 703
BLAST alignment. Second, we looked in genomes for any SPS extension. We expanded 704
each predicted SPS gene at both sides until a stop codon was reached, and we ran the 705
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
23
extensions with BLASTp against NCBI NR database. All candidates from the two 706
methods were merged, clustered by similarity, and manually inspected. Conservation in 707
multiple species and confirmation with RNA data were used as criteria to exclude 708
artifacts possibly caused by our detection method or by imperfect genome assemblies, 709
thus prioritizing specificity over sensitivity. For the most interesting cases, a new 710
alignment profile was built including the protein sequence of SPS and of the additional 711
domain, and used to search again the genome sequences. Supplementary Material S2 712
contains a description of results. 713
Prediction of conserved secondary structures 714
The program RNAz ver. 2.1 (Gruber et al. 2010) was used to predict conserved 715
secondary structures embedded in SPS coding sequences. Initially, we produced a 716
“master alignment” that included the coding sequences of all SPS genes in our dataset. 717
The nucleotide sequence alignments employed were based on the alignment of the 718
corresponding amino acid sequences. Then, the master alignment was used to extract a 719
multitude of “subset alignments”, which included only genes in specific lineages, and/or 720
only specific types of SPS (based on the residue found at the Sec position). RNAz was 721
run either directly on these subset alignments, or instead the program trimAl ver. 1.4 722
(Capella-Gutiérrez et al. 2009) was used in advance to reduce the number of 723
sequences. All secondary structures predicted in this way were then manually inspected. 724
For the best candidates, images of consensus structures were generated using the 725
ViennaRNA package (Lorenz et al. 2011)(see Supplementary Figures SM6.1 and 726
SM6.2). Supplementary Material S6 contains a detailed description of the procedure and 727
of the results, including the list of subset alignments considered. 728
Rescue experiments in Drosophila 729
For rescue experiments we used the Gal4/UAS system (Brand and Perrimon 1993). We 730
obtained cDNA for human SPS1 from the Harvard resource core 731
(http://plasmid.med.harvard.edu/PLASMID/). For SPS-ae of C.intestinalis, we obtained 732
the cDNA corresponding to the Gly isoform by performing targeted PCR on larvae 733
extracts. We obtained cDNA for SPS1-UGA from A.cephalotes by performing targeted 734
PCR on extracts provided by James F.A. Traniello. These cDNAs were cloned through 735
the Gibson method (Gibson et al. 2009) into the pUAST-attB vector linearized by double 736
digestion with BglII and XhoI and clones verified by Sanger sequencing. Primers used 737
for cloning are reported in Supplementary Material S7. Transgenic flies were obtained 738
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
24
following the method described by (Bischof et al. 2007). Line yw M{e.vas-int.DM}ZH-2A; 739
3: M{RFP.attP}ZH-86Fb was used to direct the insertion into the 3R chromosome (86F). 740
Crosses were designed to obtain expression of each transgenic SPS1 into a SelDptuf 741
homozygous mutant background (Supplementary Material S7, Supplementary Figure 742
SM7.1). We used armadillo-Gal4 (arm-Gal4) as a driver to activate expression of the 743
UAS-cDNA inserts. The final cross was: SelDptuf/CyOdfYFP; UAS-cDNA insert/MKRS x 744
SelDptuf/CyOdfYFP; arm-Gal4/TM6B. Expression of transgenes under arm-Gal4 745
promoter was confirmed by RT-PCR (Supplementary Figure SM7.2). Imaginal wing 746
discs from third instar larvae were dissected in PBS and stained with Rhodamine 747
Phalloidin from Molecular Probes® (cat #R415). 748
Acknowledgements 749
We thank James F.A. Traniello and Ysabel Milton Giraldo (Boston University, Boston) for 750
providing extracts of A.cephalotes. We thank the Drosophila Injection Platform of the 751
Consolider Project (CBM-SO, Madrid) for production of transgenic flies. RG group 752
research was funded by grants BIO2011-26205 from the Spanish Ministry of Science 753
and grant SGR-1430 from the Catalan Government. MM received a FPU doctoral 754
fellowship AP2008-04334 from the Spanish Ministry of Education. TG group research 755
was funded in part by a grant from the Spanish Ministry of Economy and 756
Competitiveness (BIO2012-37161), a grant from the Qatar National Research Fund 757
grant (NPRP 5-298-3-086), and a grant from the European Research Council under the 758
European Union's Seventh Framework Programme (FP/2007-2013) / ERC (Grant 759
Agreement n. ERC-2012-StG-310325). VNG group research was supported by NIH 760
GM061603. 761
Disclosure declaration 762
Authors declare no conflict of interest. 763
Figure legends 764
Figure 1: Phylogenetic profile of SPS and selenium utilization traits in 765
prokaryotes. The sunburst tree shows the phylogenetic structure of the reference set of 766
223 prokaryotic genomes (taken from NCBI taxonomy), and the presence of SelD genes 767
and other markers of selenium utilization. The section for archaea is zoomed in as a 768
guide to interpret the plot (1). Every ring-shaped section represents a taxonomic rank in 769
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
25
NCBI taxonomy (superkingdom, phylum, class, order, family, genus). The last two 770
outermost rings display features for each of the species analyzed. The length of the 771
black bars protruding from the outermost circle is proportional to the number of 772
selenoproteins detected in each species. The outermost ring is color coded for the 773
presence of ybbB and SelA in the species, with a black dot inside for denoting tRNAsec 774
presence. The second outermost ring is labeled for SelD presence and type: “SelD-Sec”, 775
“SelD-Cys”, “no gene found”. The color is propagated to the lower ranks by hierarchy. 776
Transparency is used to display how many species under a lineage have the same label. 777
Assuming no Cys to Sec conversion and no horizontal transfer, colors reflect the 778
predicted SelD presence at ancestral nodes. This allows detecting the Sec to Cys 779
conversions manually, for example in Clostridia (2). The hierarchical color assignment is 780
violated only in the case of Gammaproteobacteria, altered to be red. In fact, although its 781
sublineage Pasteurellales contains SelD-Sec (see 3), our analysis points to a Cys to Sec 782
conversion instead, which implies that the ancestral state for Gammaproteobacteria was 783
SelD-Cys. The plot can be used to map the Sec and selenouridine (SeU) traits (see top-784
right panel). For example Escherichia coli (4) has both traits, Pasteurellales only Sec (3), 785
and Bacillus coagulans only SeU (5). Enterococcus faecalis has a SelD-Cys gene, but 786
no other selenium utilization marker (6) (Romero et al. 2005)(Zhang et al. 2008). 787
Expanded versions of the plot (up to 8,263 species) are available in Supplementary 788
Material S1. Note that gene fusions and extensions are not represented in this plot. 789
Figure 2: Phylogenetic profile of SPS genes and approximate selenoproteome size 790
of eukaryotes. The plot recapitulates the results on 505 genomes analyzed, 791
summarized to 213 displayed here. The tree (from NCBI taxonomy) is partitioned in 792
lineages, highlighted in grey tones to help visualization. Near the tips of leaves, the 793
presence of SPS proteins is displayed using colored rectangles. Sec (green) and Cys 794
(red) forms correspond to SPS2 (top left legend), while the other homologues to SPS1 795
(top right legend). The gene extensions found for some SPS2-Cys are indicated with 796
symbols inside the rectangle (top left legend). The number of selenoproteins predicted in 797
each genome is indicated with a black vertical bar extending from the tips. As reference 798
points, mammals have ~25 selenoprotein genes, D.melanogaster has three, and 799
C.elegans only one. 800
Figure 3: Parallel gene duplications of SPS proteins in metazoa. The plot 801
summarizes the phylogenetic history of metazoan SPS genes, consisting of parallel and 802
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
26
convergent events of gene duplication followed by subfunctionalization. Each colored 803
ball represents a SPS gene, indicating the residue found at the UGA or homologous 804
codon (U for selenocysteine, C for cysteine, T for threonine, G for glycine, L for leucine, 805
R for arginine, x for unknown residue). The gene structures are schematically displayed 806
in Figure 4. The names of the insect species lacking selenoproteins are in red. The main 807
genomic events shaping SPS genes are indicated on the branches: GD whole gene 808
duplication, GDR gene duplication by retro-transposition, AE origin of an alternative 809
exon, SL Sec loss, SC conversion of Sec to Cys, SO conversion of Sec to something 810
other than Cys, GL gene loss. In our subfunctionalization hypothesis (see text), we map 811
the origin of a dual function at the root of metazoa. 812
Figure 4: Structure and function of the identified SPS genes. SPS proteins are 813
classified according to the residue found at the UGA or homologous position (see also 814
Figure 3). The presence of specific secondary structures is also indicated: bSECIS for 815
bacterial SECIS element, SRE for Sec recoding element (Howard et al. 2005), SECIS for 816
eukaryotic SECIS element, HRE for hymenopteran readthrough element. The rightmost 817
column indicates the functions predicted for the SPS proteins. SPS2 function is the 818
synthesis of selenophosphate. SPS1 function is defined as the uncharacterized 819
molecular function of Drosophila SPS1-Arg (double underlined), which is likely to be 820
similar to that of other SPS1 genes, as suggested by KO-rescue experiments in 821
Drosophila (underlined). *: for eukaryotic SPS2, the parentheses indicate that some such 822
genes are predicted to possess both SPS1 and SPS2 functions, those marked also with 823
a star (*) in Figure 3 (basically all metazoans with no SPS1 protein in the same 824
genome). 825
Figure 5: Alternative SPS1/2 transcript isoforms sorted by retrotransposition 826
within ascidians. This figure expands the section for tunicates in Figure 3. At the root of 827
ascidians, the ancestral SPS2-Sec gene acquired a novel SPS1-Gly transcript isoform, 828
through alternative exon usage at the 5’ end (AE). Then, at the root of the ascidian 829
lineage Styelidae and Pyuridae, the SPS2-Sec transcript of this dual SPS1/SPS2 gene 830
(SPS-ae) retrotransposed to the genome creating a novel SPS2-Sec gene (GDR). This 831
presumably triggered the loss of Sec from the parental gene which, as both the SECIS 832
and the UGA containing exon degenerated (SL), specialized only in the production of 833
SPS1-Gly. 834
Figure 6: Readthrough-enhancing hexanucleotide in SPS genes. The phylogenetic 835
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
27
tree on the left shows the nucleotide sequence alignment at the UGA (or homologous) 836
site in SPS sequences. Only SPS2 and SPS1-UGA genes are shown here. Codons are 837
colored according to their translation, following the same color schema used in Figures 2 838
and 4 (grey for other amino acids). The presence of the hexanucleotide described in 839
(Harrell et al. 2002) is marked with a black dot. Green dots mark the genes for which a 840
bona-fide SECIS element was identified. The last column indicates the presence of 841
SPS1 genes. 842
Figure 7: Rescue of the Drosophila melanogaster SPS1 mutant by heterologous 843
SPS1 proteins. All images show wing imaginal discs dissected from larvae with the 844
indicated constructs and genotypes. (A,B) The SPS1 mutant flies (SelDptuf / SelDptuf ) 845
result in defects in the whole organism, but can be easily monitored in the wing imaginal 846
disc epithelia; the homozygous condition (A) strongly impairs size and morphology, 847
whereas the heterozygous (B) is very similar to the wild type condition. (C-E) The severe 848
homozygous SelDptuf / SelDptuf phenotype is partially rescued by ubiquitous expression 849
of heterologous SPS1 proteins. (C) SelDptuf / SelDptuf mutants with C. intestinalis SPS1-850
Gly. (D) SelDptuf / SelDptuf mutants with A.cephalotes SPS1-UGA. 851
(E) SelDptuf / SelDptuf mutants with human SPS1-Thr. Scale bars 50 µm. 852
853
Figures: 854
(attached) 855
856
Supplementary Material 857
Find next the following supplementary sections: 858
• S1: SelD in prokaryotes 859
• S2: Gene fusions and extensions 860
• S3: Phylogeny of eukaryotic SPS proteins 861
• S4: Alternative isoforms sorted by gene duplication in ascidians 862
• S5: Evolutionary analysis of metazoan SPS genes 863
• S6: Secondary structures within coding sequences of SPS genes 864
• S7: Rescue experiments in Drosophila 865
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
28
• S8: Datasets 866
867
868
869
870
References 871
Allmang C, Wurth L, Krol A. 2009. The selenium to selenoprotein pathway in eukaryotes: 872
more molecular partners than anticipated. Biochim Biophys Acta, 1790(11):1415-23. 873
Alsina B, Serras F, Baguna J, Corominas M. 1998. patufet, the gene encoding the 874
Drosophila melanogaster homologue of selenophosphate synthetase, is involved in 875
imaginal disc morphogenesis. Mol Gen Genet, 257(2):113–123. 876
Altschmied J, Delfgaauw J, Wilde B, Duschl J, Bouneau L, Volff JN, Schartl M. 2002. 877
Subfunctionalization of duplicate mitf genes associated with differential degeneration 878
of alternative exons in fish. Genetics, 161(1):259–67. 879
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. 1997. Gapped 880
BLAST and PSI-BLAST: a new generation of protein database search programs. 881
Nucleic acids res, 25(17), 3389. 882
The International Aphid Genomics Consortium. 2010. Genome sequence of the pea 883
aphid Acyrthosiphon pisum. PLoS Biol, 8(2):e1000313. 884
Bao R, Fischer T, Bolognesi R, Brown SJ, Friedrich M. 2012. Parallel duplication and 885
partial subfunctionalization of β-catenin/armadillo during insect evolution. Mol Biol 886
Evol, 29(2):647-62. 887
Bischof J, Maeda RK, Hediger M, Karch F, Basler K. 2007. An optimized transgenesis 888
system for Drosophila using germ-line-specific phiC31 integrases. Proc Natl Acad Sci 889
U S A, 104(9), 3312–7. 890
Brand AH, Perrimon N. 1993. Targeted gene expression as a means of altering cell fates 891
and generating dominant phenotypes. Development, 118(2):401-15. 892
Bubenik JL, Miniard AC, Driscoll DM. 2013. Alternative transcripts and 3'UTR elements 893
govern the incorporation of selenocysteine into selenoprotein S. PLoS 894
One, 8(4):e62102. 895
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
29
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated 896
alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 897
1972–3. 898
Cassago A, Rodrigues EM, Prieto EL, Gaston KW, Alfonzo JD, Iribar MP, Berry MJ, 899
Cruz AK, Thiemann OH. 2006. Identification of Leishmania selenoproteins and 900
SECIS element. Mol Biochem Parasitol, 149(2):128–34. 901
Castellano S, Andrés AM, Bosch E, Bayes M, Guigó R, Clark AG. 2009. Low 902
exchangeability of selenocysteine, the 21st amino acid, in vertebrate proteins. Mol 903
Biol Evol, 26(9):2031-40. 904
Chapple CE, Guigó R. 2008. Relaxation of selective constraints causes independent 905
selenoprotein extinction in insect genomes. PLoS One, 13;3(8):e2968. 906
da Silva MTA, Caldas VEA, Costa FC, Silvestre DAMM, Thiemann OH. 2013. 907
Selenocysteine biosynthesis and insertion machinery in Naegleria gruberi. Mol 908
Biochem Parasitol, 188(2):87–90. 909
Dehal P, Boore JL. 2005. Two rounds of whole genome duplication in the ancestral 910
vertebrate. PLoS Biol, 3(10):e314. 911
Delsuc F, Brinkmann H, Chourrout D, Philippe H. 2006. Tunicates and not 912
cephalochordates are the closest living relatives of vertebrates. Nature, 913
439(7079):965–8. 914
Dominguez M, Ferres-Marco D, Gutierrez-Aviño FJ, Speicher SA, Beneyto M. 2004. 915
Growth and specification of the eye are controlled independently by Eyegone and 916
Eyeless in Drosophila melanogaster. Nat Genet, 36(1):31–9. 917
Fuessl M, Reinders J, Oefner PJ, Heinze J, Schrempf A. 2014. Selenophosphate 918
synthetase in the male accessory glands of an insect without selenoproteins. J Insect 919
Physiol, 2014 Dec;71:46-51. 920
Gabaldón T. 2008. Large-scale assignment of orthology: back to phylogenetics? 921
Genome Biol, 30;9(10):235. 922
Gabaldón T, Koonin EV. 2013. Functional and evolutionary implications of gene 923
orthology. Nat Rev Genet, 14(5):360-6. 924
Gibson DG, Young L, Chuang RY, Venter JC, Hutchison CA 3rd, Smith HO. 2009. 925
Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat 926
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
30
Methods, 6(5):343-5. 927
Gobler CJ, Lobanov AV, Tang Y-Z, Turanov AA, Zhang Y, Doblin M, Taylor GT, Sañudo 928
Wilhelmy SA, Grigoriev IV, Gladyshev VN. 2013. The central role of selenium in the 929
biochemistry and ecology of the harmful pelagophyte, Aureococcus anophagefferens. 930
ISME J, 7(7):1333-43. 931
Gruber AR, Findeiß S, Washietl S, Hofacker IL, Stadler PF. 2010. RNAz 2.0: improved 932
noncoding RNA detection. Pac Symp Biocomput, 69–79. 933
Haft DH, Self WT. 2008. Orphan SelD proteins and selenium-dependent molybdenum 934
hydroxylases. Biol Direct, 3:4. 935
Harrell L, Melcher U, Atkins JF. 2002. Predominance of six different hexanucleotide 936
recoding signals 3’ of read-through stop codons. Nucleic Acids Res, 30(9):2011–937
2017. 938
Hittinger CT, Carroll SB. 2007. Gene duplication and the adaptive evolution of a classic 939
genetic switch. Nature, 449 (7163): 677–81. 940
Howard MT, Aggarwal G, Anderson CB, Khatri S, Flanigan KM, Atkins JF. 2005. 941
Recoding elements located adjacent to a subset of eukaryal selenocysteine-942
specifying UGA codons. EMBO J, 24(8):1596–607. 943
Howard MT, Carlson BA, Anderson CB, Hatfield DL. 2013. Translational redefinition of 944
UGA codons is regulated by selenium availability. J Biol Chem, 288(27):19401-13. 945
Howard MT, Moyle MW, Aggarwal G, Carlson BA, Anderson CB. 2007. A recoding 946
element that stimulates decoding of UGA codons by Sec tRNA[Ser]Sec. RNA, 947
13(6):912–20. 948
Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, Marcet-Houben 949
M, Gabaldón T. 2011. PhylomeDB v3.0: an expanding repository of genome-wide 950
collections of trees, alignments and phylogeny-based orthology and paralogy 951
predictions. Nucleic Acids Res, 39(Database issue):D556–60. 952
Huerta-Cepas J, Dopazo J, Gabaldón T. 2010. ETE: a python Environment for Tree 953
Exploration. BMC Bioinformatics, 11:24. 954
Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying and 955
distinguishing between models. Nat Rev Genet, 11(2):97-108. 956
Jiang L, Ni J, Liu Q. 2012. Evolution of selenoproteins in the metazoan. BMC Genomics, 957
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
31
13:446. 958
Jungreis I, Lin MF, Spokony R, Chan CS, Negre N, Victorsen A, White KP, Kellis M. 959
2011. Evidence of abundant stop codon readthrough in Drosophila and other 960
metazoa. Genome Res, 21(12):2096–113. 961
Katoh K, Kuma K-i, Toh H, Miyata T. 2005. MAFFT version 5: improvement in accuracy 962
of multiple sequence alignment. Nucleic Acids Res, 33(2):511–8. 963
Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov GK, Ward LD, Birney 964
E, Crawford GE, Dekker J et al. 2014. Defining functional DNA elements in the 965
human genome. Proc Natl Acad Sci U S A, 111(17):6131-8. 966
Kim IY, Guimarães MJ, Zlotnik A, Bazan JF, Stadtman TC. 1997. Fetal mouse 967
selenophosphate synthetase 2 (SPS2): characterization of the cysteine mutant form 968
overproduced in a baculovirus-insect cell system. Proc Natl Acad Sci U S A, 969
94(2):418– 21. 970
Krol A. 2002. Evolutionarily different RNA motifs and RNA-protein complexes to achieve 971
selenoprotein synthesis. Biochimie, 84(8):765–774. 972
Kryukov G, Gladyshev V. 2004. The prokaryotic selenoproteome. EMBO Rep, 5(5):538. 973
Lambert MJ, Cochran WO, Wilde BM, Olsen KG, Cooper CD. 2015. Evidence for 974
widespread subfunctionalization of splice forms in vertebrate genomes. Genome Res, 975
25(5):624-32. 976
Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA 977
genes in nucleotide sequences. Nucleic Acids Res, 32(1):11–6. 978
Lacourciere GM, Mihara H, Kurihara T, Esaki N, Stadtman TC. 2000. Escherichia coli 979
NifS-like proteins provide selenium in the pathway for the biosynthesis of 980
selenophosphate. J Biol Chem, Aug 4;275(31):23769-73. 981
Lee KH, Shim MS, Kim JY, Jung HK, Lee E, Carlson BA, Xu X-M, Park JM, Hatfield DL, 982
Park T et al. 2011. Drosophila selenophosphate synthetase 1 regulates vitamin B6 983
metabolism: prediction and confirmation. BMC Genomics, 12:426. 984
Lobanov AV, Gromer S, Salinas G, Gladyshev VN. 2006. Selenium metabolism in 985
Trypanosoma: characterization of selenoproteomes and identification of a 986
Kinetoplastida-specific selenoprotein. Nucleic Acids Res, 34(14):4012. 987
Lobanov AV, Fomenko DE, Zhang Y, Sengupta A, Hatfield DL, Gladyshev VN. 2007. 988
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
32
Evolutionary dynamics of eukaryotic selenoproteomes: large selenoproteomes may 989
associate with aquatic life and small with terrestrial life. Genome Biol, 8(9):R198. 990
Lobanov AV, Hatfield DL, Gladyshev VN. 2008. Selenoproteinless animals: 991
selenophosphate synthetase SPS1 functions in a pathway unrelated to 992
selenocysteine biosynthesis. Protein Sci, 17(1):176. 993
Lobanov AV, Hatfield DL, Gladyshev VN. 2009. Eukaryotic selenoproteins and 994
selenoproteomes. Biochim Biophys Acta, 1790(11):1424–8. 995
Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, 996
Hofacker IL. 2011. ViennaRNA Package 2.0. Algorithms Mol Biol, 24;6:26. 997
Lowe T, Eddy S. 1997. tRNAscan-SE: A program for improved detection of transfer RNA 998
genes in genomic sequence. Nucleic Acids Res, 25(5):955. 999
Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. 1000
Science, 290(5494):1151-5. 1001
Mariotti M, Guigó R. 2010. Selenoprofiles: profile-based scanning of eukaryotic genome 1002
sequences for selenoprotein genes. Bioinformatics, 26(21):2656–63. 1003
Mariotti M, Ridge PG, Zhang Y, Lobanov AV, Pringle TH, Guigó R, Hatfield DL, 1004
Gladyshev VN. 2012. Composition and evolution of the vertebrate and mammalian 1005
selenoproteomes. PLoS One, 7(3):e33066. 1006
Mariotti M, Lobanov AV, Guigó R, Gladyshev VN. 2013. SECISearch3 and Seblastian: 1007
new tools for prediction of SECIS elements and selenoproteins. Nucleic Acids Res, 1008
41(15):e149. 1009
Morey M, Serras F, Corominas M. 2003. Halving the selenophosphate synthetase gene 1010
dose confers hypersensitivity to oxidative stress in Drosophila melanogaster. FEBS 1011
Lett, 534(1-3):111-4. 1012
Notredame C, Higgins DG, Heringa J. 2000. T-Coffee: A novel method for fast and 1013
accurate multiple sequence alignment. J Mol Biol, 302(1):205–17. 1014
Novoselov SV, Rao M, Onoshko NV, Zhi H, Kryukov GV, Xiang Y, Weeks DP, Hatfield 1015
DL, Gladyshev VN. 2002. Selenoproteins and selenocysteine insertion system in the 1016
model plant cell system, Chlamydomonas reinhardtii. EMBO J, 21(14):3681. 1017
Otero L, Romanelli-Cedrez L, Turanov AA, Gladyshev VN, Miranda-Vizuete A, Salinas 1018
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
33
G. 2014. Adjustments, extinction, and remains of selenocysteine incorporation 1019
machinery in the nematode lineage. RNA, 20(7):1023-34. 1020
Pacheco TR, Gomes AQ, Barbosa-Morais NL, Benes V, Ansorge W, Wollerton M, Smith 1021
CW, Valcárcel J, Carmo-Fonseca M. 2004. Diversity of vertebrate splicing factor 1022
U2AF35: identification of alternatively spliced U2AF1 mRNAS. J Biol Chem, 1023
279(26):27039– 49. 1024
Palenik B, Grimwood J, Aerts A, Rouzé P, Salamov A, Putnam N, Dupont C, Jorgensen 1025
R, Derelle E, Rombauts S et al. 2007. The tiny eukaryote Ostreococcus provides 1026
genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A, 1027
104(18):7705–10. 1028
Palioura S, Sherrer RL, Steitz TA, Söll D, Simonovic M. 2009. The human SepSecS-1029
tRNASec complex reveals the mechanism of selenocysteine formation. Science, 1030
325(5938):321–325. 1031
Persson BC, Böck A, Jäckle H, Vorbrüggen G. 1997. SelD homolog from Drosophila 1032
lacking selenide-dependent monoselenophosphate synthetase activity. J Mol Biol, 1033
274(2):174–80. 1034
Philippe H, Brinkmann H, Lavrov DV, Littlewood DT, Manuel M, Wörheide G, Baurain D. 1035
2011. Resolving difficult phylogenetic questions: why more sequences are not 1036
enough. PLoS Biol, 9(3):e1000602. 1037
Pruitt KD, Tatusova T, Brown GR, Maglott DR. 2012. NCBI Reference Sequences 1038
(RefSeq): current status, new features and genome annotation policy. Nucleic Acids 1039
Res, 40(Database issue):D130-5. 1040
Romero H, Zhang Y, Gladyshev VN, Salinas G. 2005. Evolution of selenium utilization 1041
traits. Genome Biol, 6(8):R66. 1042
Rother M, Resch A, Wilting R, Böck A. 2001. Selenoprotein synthesis in archaea. 1043
Biofactors, 14(1-4):75–83. 1044
Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, 1045
Church DM, Dicuccio M, Federhen S et al. 2009. Database resources of the National 1046
Center for Biotechnology Information. Nucleic Acids Res, 38(Database issue):D5-16. 1047
Small-Howard A, Morozova N, Stoytcheva Z, Forry EP, Mansell JB, Harney JW, Carlson 1048
BA, Xu X-M, Hatfield DL, Berry MJ. 2006. Supramolecular complexes mediate 1049
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
34
selenocysteine incorporation in vivo. Mol Cell Biol, 26(6):2337–46. 1050
Squires J, Berry M. 2008. Eukaryotic Selenoprotein Synthesis: Mechanistic Insight 1051
Incorporating New Factors and New Functions for Old Factors. IUBMB Life, 1052
60(4):232–235. 1053
Srivastava M, Mallard C, Barke T, Hancock LE, Self WT. 2011. A selenium-dependent 1054
xanthine dehydrogenase triggers biofilm proliferation in Enterococcus faecalis 1055
through oxidant production. J Bacteriol, 193(7):1643–52. 1056
Stock T, Rother M. 2009. Selenoproteins in Archaea and Gram-positive bacteria. 1057
Biochimica Biophysica Acta, 1790(11):1520–1532. 1058
Su D, Ojo TT, Söll D, Hohn MJ. 2012. Selenomodification of tRNA in archaea requires a 1059
bipartite rhodanese enzyme. FEBS Lett, 586(6):717–21. 1060
Talavera D, Vogel C, Orozco M, Teichmann SA, de la Cruz X. 2007. The 1061
(in)dependence of alternative splicing and gene duplication. PLoS Comput Biol, 1062
3(3):e33. 1063
Tamura T, Yamamoto S, Takahata M, Sakaguchi H, Tanaka H, Stadtman TC, Inagaki K. 1064
2004. Selenophosphate synthetase genes from lung adenocarcinoma cells: Sps1 for 1065
recycling L-selenocysteine and Sps2 for selenite assimilation. Proc Natl Acad Sci U S 1066
A, 101(46):16162–7. 1067
Taskov K, Chapple C, Kryukov GV, Castellano S, Lobanov AV, Korotkov KV, Guigó R, 1068
Gladyshev VN. 2005. Nematode selenoproteome: the use of the selenocysteine 1069
insertion system to decode one codon in an animal genome? Nucleic Acids Res, 1070
33(7):2227-38. 1071
Voskoboynik A, Neff NF, Sahoo D, Newman AM, Pushkarev D, Koh W, Passarelli B, 1072
Fan HC, Mantalas GL, Palmeri KJ et al. 2013. The genome sequence of the colonial 1073
chordate, Botryllus schlosseri. eLife, 2:e00569. 1074
Xu X, Carlson B, Mix H, Zhang Y, Saira K, Glass R, Berry M, Gladyshev V, Hatfield D. 1075
2007a. Biosynthesis of selenocysteine on its tRNA in eukaryotes. PLoS Biol, 5(1):e4. 1076
Xu X, Carlson B, Irons R, Mix H, Zhong N, Gladyshev V, Hatfield D. 2007b. 1077
Selenophosphate synthetase 2 is essential for selenoprotein biosynthesis. Biochem 1078
J, 404(Pt 1):115. 1079
Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol, 1080
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
35
Aug;24(8):1586-91. 1081
Yoshizawa S, Böck A. 2009. The many levels of control on bacterial selenoprotein 1082
synthesis. Biochim Biophys Acta, 1790(11):1404–1414. 1083
Zhang Y, Romero H, Salinas G, Gladyshev V. 2006. Dynamic evolution of 1084
selenocysteine utilization in bacteria: a balance between selenoprotein loss and 1085
evolution of selenocysteine from redox active cysteine residues. Genome Biol, 7(10), 1086
R94. 1087
Zhang Y, Turanov A, Hatfield D, Gladyshev V. 2008. In silico identification of genes 1088
involved in selenium metabolism: evidence for a third selenium utilization trait. BMC 1089
Genomics, 9(1):251. 1090
Zhang Y, Gladyshev VN. 2008. Trends in selenium utilization in marine microbial world 1091
revealed through the analysis of the global ocean sampling (GOS) project. PLoS 1092
Genet, 4(6):e1000095. 1093
Zhang Y, Gladyshev VN. 2010. General trends in trace element utilization revealed by 1094
comparative genomic analyses of Co, Cu, Mo, Ni, and Se. J Biol Chem, 285(5), 1095
3393–405. 1096
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from
10.1101/gr.190538.115Access the most recent version at doi: published online July 20, 2015Genome Res.
Marco Mariotti, Didac Santesmasses, Salvador Capella-Gutierrez, et al. recurrent subfunctionalizationrelocation of function through independent duplications and Evolution of selenophosphate synthetases: emergence and
Material
Supplemental
http://genome.cshlp.org/content/suppl/2015/07/22/gr.190538.115.DC1.html
P<P
Published online July 20, 2015 in advance of the print journal.
Manuscript
Accepted
manuscript is likely to differ from the final, published version. Peer-reviewed and accepted for publication but not copyedited or typeset; accepted
Open Access
Open Access option.Genome ResearchFreely available online through the
License
Commons Creative
.http://creativecommons.org/licenses/by/4.0/as described at
available under a Creative Commons License (Attribution 4.0 International license), , isGenome ResearchThis manuscript is Open Access.This article, published in
ServiceEmail Alerting
click here.top right corner of the article or
Receive free email alerts when new articles cite this article - sign up in the box at the
http://genome.cshlp.org/subscriptionsgo to: Genome Research To subscribe to
Published by Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory Press on July 23, 2015 - Published by genome.cshlp.orgDownloaded from