non-coding rna annotation of the schistosoma mansoni …non-coding rna (ncrna) plays a crucial role...

13
Non-coding RNA Annotation of the Schistosoma mansoni Genome Claudia S. Copeland 1,2 , Manja Marz 1 , Dominic Rose 1 , Jana Hertel 1 , Paul J. Brindley 2 , Clara Bermudez Santana 1,8 , Stephanie Kehr 1 , Camille Stephan-Otto Attolini 3 , Peter F. Stadler 1,4,5,6,7,1 Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, / University of Leipzig, artelstraße 16-18, D-04107 Leipzig, Germany 2 Department of Microbiology, Immunology & Tropical Medicine, George Washington University Medical Center, 2300 I Street, NW, Washington, DC 20037, USA 3 Memorial Sloan-Kettering Cancer Center, Computational Biology Department, 1275 York Avenue, Box # 460, New York, NY 10065, USA 4 Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany 5 Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany 6 Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA 7 Institute for Theoretical Chemistry, University of Vienna, W¨ ahringerstraße 17, A-1090 Wien, Austria 8 Department of Biology, National University of Colombia, Carrera 45 No. 26-85, Bogota, D.C., Colombia Email: Peter F. Stadler - [email protected]; * Corresponding author Abstract Background: Schistosomes are trematode parasites of the Phylum Platyhelminthes. They are considered the most important of the human helminth parasites in terms of morbidity and mortality. Draft genome sequences are now available for Schistosoma mansoni and Schistosoma japonicum. Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and defense, homeostasis, and pathogenesis. The genome-wide annotation of ncRNAs is a non-trivial task unless well-annotated genomes of closely related species are already available. Results: A homology search for structured ncRNA in the genome of S. mansoni resulted in 23 types of RNAs with conserved primary and secondary structure. Among these, we identified rRNA, snRNA, SL RNA, SRP, tRNAs and RNase P, and also possibly MRP and 7SK RNAs. In addition, we confirmed five miRNAs that have recently been reported in S. japonicum and found two additional homologs of known miRNAs. The tRNA complement of S. mansoni is comparable to that of the free-living planarian Schmidtea mediterranea, although for some amino acids differences of more than a factor of two are observed: Leu, Ser, and His are overrepresented, while Cys, Meth, and Ile are underrepresented in S. mansoni. Several ncRNAs that are expected to exist in the S. mansoni genome were not found, among them the telomerase RNA, vault RNAs, and Y RNAs. Conclusions: The ncRNA sequences and structures presented here represent the most complete dataset of ncRNA from any lophotrochozoan reported so far. It provides a key addition to our understanding of the genomes of schistosomes and indeed eukaryotic genomes at large. 1

Upload: others

Post on 10-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

Non-coding RNA Annotation of the Schistosoma mansoni

Genome

Claudia S. Copeland1,2 , Manja Marz1 , Dominic Rose1 , Jana Hertel1 , Paul J. Brindley2 , Clara

Bermudez Santana1,8 , Stephanie Kehr1 , Camille Stephan-Otto Attolini3 , Peter F. Stadler1,4,5,6,7,∗

1Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, / University of Leipzig,Hartelstraße 16-18, D-04107 Leipzig, Germany2Department of Microbiology, Immunology & Tropical Medicine, George Washington University Medical Center, 2300 I Street, NW,Washington, DC 20037, USA3Memorial Sloan-Kettering Cancer Center, Computational Biology Department, 1275 York Avenue, Box # 460, New York, NY10065, USA4Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany5Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany6Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA7Institute for Theoretical Chemistry, University of Vienna, Wahringerstraße 17, A-1090 Wien, Austria8Department of Biology, National University of Colombia, Carrera 45 No. 26-85, Bogota, D.C., Colombia

Email: Peter F. Stadler - [email protected];

∗Corresponding author

Abstract

Background: Schistosomes are trematode parasites of the Phylum Platyhelminthes. They are considered the mostimportant of the human helminth parasites in terms of morbidity and mortality. Draft genome sequences are nowavailable for Schistosoma mansoni and Schistosoma japonicum. Non-coding RNA (ncRNA) plays a crucial rolein gene expression regulation, cellular function and defense, homeostasis, and pathogenesis. The genome-wideannotation of ncRNAs is a non-trivial task unless well-annotated genomes of closely related species are alreadyavailable.

Results: A homology search for structured ncRNA in the genome of S. mansoni resulted in 23 types of RNAs withconserved primary and secondary structure. Among these, we identified rRNA, snRNA, SL RNA, SRP, tRNAsand RNase P, and also possibly MRP and 7SK RNAs. In addition, we confirmed five miRNAs that have recentlybeen reported in S. japonicum and found two additional homologs of known miRNAs. The tRNA complement ofS. mansoni is comparable to that of the free-living planarian Schmidtea mediterranea, although for some aminoacids differences of more than a factor of two are observed: Leu, Ser, and His are overrepresented, while Cys,Meth, and Ile are underrepresented in S. mansoni. Several ncRNAs that are expected to exist in the S. mansoni

genome were not found, among them the telomerase RNA, vault RNAs, and Y RNAs.

Conclusions: The ncRNA sequences and structures presented here represent the most complete dataset of ncRNAfrom any lophotrochozoan reported so far. It provides a key addition to our understanding of the genomes ofschistosomes and indeed eukaryotic genomes at large.

1

Page 2: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

Background

Non-coding RNA (ncRNA) plays a crucial role ingene expression regulation, cellular function and de-fense, and disease [1]. In contrast to protein-codingmRNAs, ncRNAs do not form a homogeneous class.The best-characterized subclasses form stable base-pairing patters (secondary structures) that are cru-cial for the their function. This group includes thewell-known tRNAs, catalytically active RNAs suchas rRNA, snRNAs, RNAse P RNA, and other ri-bozymes, and regulatory RNAs such as microRNAsand spliceosomal RNAs that direct protein com-plexes to specific RNA targets.

Most non-vertebrate genome projects have putlittle emphasis on a comprehensive annotation ofncRNAs. Indeed, most non-coding RNAs, with thenotable exception of tRNAs and rRNAs, are diffi-cult or impossible to detect with blast. Hence theirannotation is not part of generic genome annotationpipelines. Dedicated computational searches for par-ticular ncRNAs, for example, RNAse P and MRP[2,3], 7SK RNAs [4,5], or telomerase RNA [6,7], areveritable research projects in their own right. De-spite best efforts, large territory remains unchartedacross the animal phylogeny.

The main difficulty with ncRNA annotation ispoor sequence conservation and indel patterns thatoften correspond to large additional “expansion do-mains”. In many cases, the secondary structure ismuch better conserved than the primary sequence,providing a means of confirming candidate ncRNAseven in cases where sequence conservation is confinedto a few characteristic motifs. Secondary structureconservation can also be utilized to detect homologsof some ncRNAs based on characteristic combina-tions of sequence and structure motifs using specialsoftware tools designed for this purpose.

In [8] we described a protocol for a moredetailed homology-based ncRNA annotation thanwhat can be achieved with currently available au-tomatic pipelines. Here we apply this scheme to thegenome of S. mansoni.

Schistosomes belong in an early-diverging groupwithin the Digenea, but are clearly themselves highlyderived [9–11]. The flatworms are a long-branchgroup, suggesting rapid mutation rates (see [12]).

Schistosome genomes are comparatively large,estimated at about 300 megabase pairs for the hap-loid genome of Schistosoma mansoni [13]. The

other major schistosome species parasitizing humansprobably have a genome of similar size, based onthe similarity in appearance of their karyotypes[14]. These large sizes may be characteristic ofplatyhelminth genomes in general: the genome ofSchmidtea mediterranea, the only other sequencedplatyhelminth genome, is even larger, with the cur-rent genome sequencing project reporting a size of∼ 480 megabase pairs [15] 1.

The protein-coding portion of the Schistosoma

genomes have received much attention in recentyears. Published work includes transcriptomedatabases for both S. japonicum [16] and S. man-

soni [17], characterization of promoters [18,19], andphysical mapping and annotation of protein-codinggenes from both the S. mansoni and S. japonicum

genome projects [20]. Recently, a systematic annota-tion of protein-coding genes in S. japonicum was re-ported [21]. In contrast to other, better-understood,parasites such as Plasmodium [22], however, notmuch is known about the non-coding RNA comple-ment of schistosomes. Only the spliced leader RNA(SL RNA) of S. mansoni [23], the hammerhead ri-bozymes encoded by the SINE-like retrotransposonsSm-α and Sj-α [24, 25], and secondary structure el-ements in the LTR retrotransposon Boudicca [26]have received closer attention. Ribosomal RNA se-quences have been available mostly for phylogeneticpurposes [27], and tRNAs have been studied to alimited degree [28].

In this contribution we give a comprehensiveoverview of the evolutionary conserved non-codingRNAs in the S. mansoni genome. We discuss repre-sentatives of 23 types of ncRNAs that were detectedbased on both sequence and secondary structure ho-mology.

Results & Discussion

Structure and homology-based searches of the S.

mansoni genome revealed ncRNAs from 23 differ-ent RNA categories. Table 2 lists these functionalncRNA category, the number of predicted genes ineach category, and references associated with eachRNA type. Supplementary fasta files contain-ing the ncRNA genes, bed files with the genomeannotation, and stockholm-format alignment filescan be accessed at http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/08-014.

1http://genome.wustl.edu/genome.cgi?GENOME=Schmidtea\%20mediterranea

2

Page 3: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

Figure 1: Comparison of the tRNA complement ofSchistosoma mansoni and Schmidtea mediterranea.A: Comparison of anti-codon distributions for thesame aminoacid. Numbers below each pie-chart arethe total number of tRNAs genes coding the corre-sponding amino acid.B: Number of tRNAs decoding a particular aminoacid.blue: Schistosoma mansoni, yellow: Schmidtea

mediterranea. Sut: possible suppressor tRNAs (CTA,TTA); Sct: Selenocysteine tRNAs (TCA); Unt: unde-termined tRNAs; Pst: predicted pseudogenes

Transfer RNAs

Candidate tRNAs were predicted with tRNAscan-SE

in the genomes of both S. mansoni and S. mediter-

ranea (a free-living platyhelminth, used for compar-ison). After removal of transposable element se-quences (see below), tRNAscan predicted a total of663 tRNAs for S. mansoni and 728 for S. mediter-

ranea. These included tRNAs encoding the stan-dard 20 amino acids of the traditional genetic code,selenocysteine encoding tRNAs (tRNAsec) [29] andpossible suppressor tRNAs [30] in both genomes.The tRNAsec from schistosomes has been character-ized, and is similar in size and structure to tRNAsecfrom other eukaryotes [31].

The tRNA complements of the two platy-helminth genomes are compared in detail in Figure 1.While most amino acids were represented in ap-proximately equal numbers in the two species, thereare several notable deviations. S. mansoni containsmany more leucine (86 vs. 46) and histidine (27 vs.8) tRNAs, while serine (51 vs. 94), cysteine (21 vs.44), methionine (21 vs. 44), and isoleucine (17 vs.42) are underrepresented. In addition, there are sev-eral substantial differences in codon usage. In mostcases, S. mansoni has a more diverse repertoire oftRNAs: tRNA-Asn-ATT, tRNA-Arg-CGC, tRNA-

His-ATG, tRNA-Ile-GAT, tRNA-Pro-GGG, tRNA-Tyr-ATA, tRNA-Val-GAC are missing in Schmidtea.Only tRNA-Ser-ACT is present in Schmidtea but ab-sent in Schistosoma.

It has recently been shown that changes incodon usage, even while coding the same protein se-quences, can severely attenuate the virulence of viralpathogens [32] by “de-optimizing” translational effi-ciency. This observation leads us to speculate thatthe greater diversity of the tRNA repertoire could berelated to the selection pressures of the parasitic life-style of S. mansoni. It would be interesting thereforeto investigate in detail the possibility of differencesin codon usage of proteins highly expressed in thedifferent stages of S. mansoni ’s life cycle.

The most striking result of the tRNAscan-SE

analysis was the initial finding of 1,135 glutaminetRNAs (Gln-tRNAs) in S. mansoni, in contrast to65 Gln-tRNAs in S. mediterranea. Nearly all ofthese (1,098) were tRNA-Gln-TTG. In addition, anextreme number of 1,824 tRNA-pseudogenes in S.

mansoni (vs. 19 in S. mediterranea) were predicted.Of these, 1,270 were homologous to tRNA-Gln-TTG.These two groups of tRNA-Gln-TTG-derived genes(those predicted to be pseudogenes and those pre-dicted to be functional tRNAs) totaled 2,368. Thesehigh numbers suggest a tRNA-derived mobile ge-

3

Page 4: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

netic element. We therefore ran the 2,368 tRNA-Gln-TTG genes through the RepeatMasker program[33]. Almost all of them (2,342) were classified asSINE elements. Further blast analysis revealedthat these elements are similar to members of theSm-α family of S. mansoni SINE elements [34]. Re-moval of these SINE-like elements yielded a total of26 predicted glutamine-encoding tRNAs in S. man-

soni.Homology-based analysis yielded similar, though

somewhat less sensitive, results to those oftRNAscan-SE. A blast search with Rfam’s tRNAconsensus yielded 617 predicted tRNAs compared tothe 663 predictions made by tRNAscan.

Ribosomal RNAs

As usual in eukaryotes, the 18S, 5.8S, and 28Sgenes are produced by RNA polymerase I from atandemly repeated polycistronic transcript, the ribo-somal RNA operon. The S. mansoni genome con-tains about 90-100 copies [35, 36] which are nearlyidentical at sequence level, because they are subjectto concerted evolution [37]. The repetitive struc-ture of the rRNA operons causes substantial prob-lems for genome assembly software [38]. In orderto obtain a conservative estimate of the copy num-ber, we retained only partial operon sequences thatcontained at least two of the three adjacent rRNAgenes. We found 48 loci containing parts of 18S,5.8S, and 28S genes, 32 loci covering 18S and 5.8SrRNA, and 57 loci covering 5.8S and 28S rRNAs(Supplemental Figure S1, S2). Adding the copynumbers, we have not fewer than 80 copies (basedon linked 18S rRNAs) and no more than 137 copies(based on linked 5.8S rRNA). The latter is probablyan overestimate due to the possibility that the 5.8SrRNA may be contained in two scaffolds. The copynumber of rRNA operons is thus consistent with theestimate of 90-100 from hybridization analysis [35].

The 5S rRNA is a polymerase III transcript thathas not been studied in S. mansoni so far. We find21 copies of the 118nt long 5S rRNA. Four of thesecopies are located within a 3000nt cluster on Scaf-

fold010519.

Spliceosomal RNAs and Spliced Leader RNA

Spliceosomes, the molecular machines responsiblefor most splicing reactions in eukaryotic cells, are ri-bonucleoprotein complexes similar to ribosomes [39].

The major spliceosome, which cleaves GT-AG in-trons, includes the five snRNAs U1, U2, U4, U5,and U6. In the S. mansoni genome, all of them aremulticopy genes. By homology search we found 34U1, 15 U2, 19 U4, 9 U5, and 55 U6 sequences in thegenome assembly. Interpreting all sequences thatare identical in short flanking regions as the same,we would retain only 3 U1, 3 U2, 1 U4, 2 U5, and 9U6 genes [40]. The true copy number in the S. man-

soni genome is most likely somewhere between theseupper and lower bounds. Secondary structures forthese are similar to those of typical snRNAs, Fig. 2.

A second, much less frequent, minor spliceosomeis responsible for the processing of atypical AT–ACintrons. It shares only the U5 snRNA with the ma-jor spliceosome. The other four RNA componentsare replaced by variants called U11, U12, U4atac,and U6atac [41]. The minor-spliceosomal snRNAsare typically much less conserved than the RNAcomponents of the major spliceosome [40]. It wasnot surprising, therefore, that these RNAs were de-tectable only by means of GotohScan [8] but notwith the much less sensitive blast searches. Al-though U4atac and U6atac are quite diverged com-pared to known homologs, they can be recognizedbased on both secondary structure and conserved se-quence motifs. Furthermore, the U4atac and U6atacsequences can interact to form the functionally nec-essary duplex structure shown in Fig. 2.

An analysis of promoter sequences showed thatthe putative snRNA promoter motifs in S. mansoni

are highly derived. Only one of the two U12 genesexhibits a clearly visible snRNA-like promoter orga-nization.

The Spliced Leader (SL) RNA is one of the veryfew previously characterized ncRNAs from S. man-

soni [23]. The 90nt SL RNA, which was found in a595nt tandemly repeated fragment (accession num-ber M34074), contains the 36nt leader sequence atits 5’ end which is transfered in the transsplicing re-action to the 5’ termini of mature mRNAs. Usingblastn, we identified 54 SL RNA genes. These can-didates, along with 100nt flanking sequence, werealigned using ClustalX, revealing 6 sequences withaberrant flanking regions, which we suspect to bepseudogenic. The remaining sequences are 43 iden-tical copies and 5 distinct sequence variants. Asecondary structure analysis corroborates the modelof [23], according to which the S. mansoni SL RNAhas only two loops, with an unpaired Sm binding site(Supplemental Figure S3). This coincides with the

4

Page 5: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

AC

A

U

A

U

U

A

G

G

A

U

A

G

G

G

G

U

U

G

U

C

C

U

C

U

C

U

C

A

C

U

C

A

C

A

U

A

U

A

A

C

C

G

C

A

A

U

G

C

C

U

UA

U

U

U

AU

A

A

U

C

U

G

U

A

G

U

U

U

U

C

G

U

G

G

G

G

G

G

G

C

A

G

G

A

U

G

G

A

U

C

G

C

U

U

C

CU

G

A

G

A

A

C

G

U

U

G

A

A

C

G

U

U

G

G

A

G

U

G

A

A

G

U

G

A

C

G

C

A

C

U

C

G

G

C

A

U

U

A

C

G

A

A

U

U

U

A

A

G

A

U

A

G

G

U

G

U

C

C

C

U

U

G

A

G

C

U

G

G

U

G

C

U

U

G

C

G

C

C

G

A

C

C

A

A

U

C

U

G

U

U

G

U

C

C

G

G

C

A

U

G

G

U

G

A

C

U

G

C

A

U

C

C

A

G

U

C

G

G

G

C

C

U

C

U

A

G

U

G

C

G

U

A

U

A

G

U

U

G

U

U

A

A

C

A

C

G

U

A

A

G

G

A

C

G

G

U

G

C

C

C

A

AG

C

C

A

C

U

C

U

G

A

U

U

C

U

U

C

U

A

U

G

C

U

U

G

U

G

G

U

A

C

A

G

G

C

G

U

C

U

A

G

C

A

C

G

C

A

A

C

G

G

U

A

G

G

C

C

G

G

A

U

A

A

U

U

G

G

U

C

G

A

C

G

A

U

U

G

A

C

G

C

A

A

U

G

C

A

A

U

C

G

A

C

G

U

C

C

G

U

A

G

U

U

U

A

C

A

C

C

G

G

G

A

U

A

C

A

C

G

U

C

CG

G

A

G

C

G

A

A

G

A

G

U

U

U

A

C

A

G

C

G

A

U

G

U

A

G

U

U

A

C

U

C

C

A

U

C

G

G

A

C

G

U

A

G

G

U

A

U

G

A

U

U

A

U

A

G

U

U

G

A

U

U

G

G

A

G

U

G

C

C

A

U

U

A

C

U

U

G

U

C

U

A

G

A

U

G

U

U

C

U

U

U

C

C

U

U

U

U

U

U

C

G

C

C

G

U

A

U

A

C

U

U

A

G

U

C

G

U

U

U

U

A

C

G

U

U

C

U

U

U

C

G

G

U

G

A

C

G

C

C

C

A

G

U

U

A

A

A

A

U

G

G

G

G

U

A

G

G

U

U

G

U

A

C

A

U

G

G

U

A

C

C

C

G

U

C

U

G

U

A

A

A

A

U

C

A

G

A

A

U

U

U

C

U

G

U

A

C

G

U

U

C

U

G

U

U

C

G

C

U

G

A

U

C

G

U

C

C

A

U

C

C

G

U

U

C

A

A

U

C

G

G

U

C

AA

G

C

U

C

A

CU

A

A

C

C

A

C

A

C

C

A

C

G

C

U

C

C

C

G

G

U

U

C

G

C

G

A

G

G

C

C

A

C

U

U

U

U

A

G

A

C

C

C

U

C

G

C

C

U

G

C

C

C

C

C

C

U

G

U

U

G

U

U

G

C

G

G

A

A

A

U

A

A

A

U

A

A

C

C

C

C

G

U

A

C

G

G

G

G

U

C

C

G

A

U

G

C

U

C

A

A

C

G

G

G

U

G

C

C

G

U

A

A

U

U

U

C

A

G

G

C

A

U

A

A

U

G

C

A

C

U

C

G

A

C

A

U

C

U

A

G

G

U

A

A

G

C

C

U

U

G

U

G

C

G

A

U

G

G

A

C

U

A

U

A

G

G

C

U

A

U

G

G

G

U

G

U

C

U

U

G

A

G

C

A

C

C

U

G

C

G

A

A

G

C

G

G

G

A

U

C

U

U

U

G

G

U

U

G

U

U

U

U

C

A

U

A

U

A

U

A

G

U

A

U

G

U

G

U

U

C

C

U

U

U

G

A

C

U

U

G

A

A

U

C

C

U

G

U

A

C

U

U

C

G

C

U

U

G

U

C

G

U

G

A

U

C

U

G

G

C

C

A

A

G

C

C

U

G

G

C

G

A

G

C

G

A

U

G

A

A

C

U

C

A

U

A

U

G

GU

C

G

UC

G

G

G

G

U

U

C

A

G

G

A

U

G

C

U

C

C

U

C

A

C

G

A

U

C

U

G

C

A

G

A

C

G

A

A

C

U

G

A

U

A

C

A

U

C

U

G

U

U

G

G

C

A

C

G

G

U

C

A

C

U

G

A

U

A

A

C

C

U

G

C

G

G

C

G

C

C

U

G

U

G

G

G

U

U

G

C

A

C

C

A

C

A

C

A

G

G

U

U

G

A

C

G

U

C

U

A

C

C

C

C

G

U

U

G

A

G

C

A

C A

C

A

C

U

A

A

C

U

A

G

U

G

U

G

G

A

G

U

U

A

C

U

A

G

G

C

A

C

G

A

A

A

C

UU

U

A

G

A

G

C

C

G

C

A

G

U

C

G

C

G

A

A

A

G

U C

G

A

U

G

A

G

A

GA

U

AU

U

U

G

G

U

C

G

A

C

C

G

C

C

C

A

C

A

C

A

U

G

U

U

G

C

U

C

U

C

C

C

C

C

U

U

A

G

A

C

G

G

G

C

U

A

A

A

U

G

G

G

A

A

C

U

A

G

C

A

A

C

U

A

C

C

C

G

G

A

C

A

U

A

A

A

C

A

U

U

U

C

U

G

C

U

G

G

G

A

A

A

U

G

A

G

C

C

G

C

U

U

U

U

C

U

C

U

A

U

U

A

U

U

UU

U

U1

U11

U2

U12

U5

U4atac/U6atac

U4/U6

Figure 2: Secondary structures of the 9 snRNAs and the interaction complexes of U4/U6 and U4atac/U6atac,respectively.

SL RNA structure of Rotifera [42], but is in contrastto the SL RNAs in most other groups of eukary-otes, which exhibit single or triple stem-loop struc-tures [43]. A blast-search against S. mansoni ESTdata confirms that the 5’ part of the SL is indeedtransspliced to mRNAs.

SRP RNA and Ribonuclease P RNA

Signal recognition particle (SRP) RNA, also knownas 7SL RNA, is part of the signal recognition par-ticle, a ribonucleoprotein that directs packaged pro-teins to their appropriate locations in the endoplas-mic reticulum. Although one of the protein subunitsof this ribonucleoprotein was cloned in 1995 [44], lit-tle is known about the other subunits or the RNAcomponent in S. mansoni. We found eight proba-ble candidates for the SRP RNA, with one almostcanonical sequence (Supplemental Figure S4), andfour possible candidates with point mutations whichmay influence their function.

The RNA component of Ribonuclease P (RNaseP) is the catalytically active part of this enzymethat is required for the processing of tRNA precur-sors [45,46]. We found one classic RNase P RNA in

the S. mansoni genome using both GotohScan andRNAbob with the eukaryotic (“nuclear”) Rfam con-sensus sequence for RNase P as search sequence.

MicroRNAs

MicroRNAs are small RNAs that are processed fromhairpin-like precursors, see e.g. [47]. They are in-volved in post-transcriptional regulation of mRNAmolecules. So far, no microRNAs have been veri-fied experimentally in S. mansoni. The presence offour protein-coding genes encoding crucial compo-nents of the microRNA processing machinery (Dicer,Argonaut, Drosha, and Pasha/DGCR8) [48,49], andthe presence of Argonaut-like genes in both S. japon-

ica [50] and S. mansoni (detected by tblastn inEST data, see Supplemental Material), strongly sug-gests that schistosomes have a functional microRNAsystem. Indeed, most recently five miRNAs werefound by direct cloning for S. japonicum that arealso conserved in S. mansoni [51]: let-7, mir-71,bantam, mir-125, and a single schistosome-specificmicroRNA. The precursor sequences, however, arequite diverged from the consensus of the homologousgenes in Bilateria.

5

Page 6: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

****************** * Structure .((.(((.(((((.(((((..(((((((((...............................)))))))))))))).))))).))).))...sma-mir-124 UUGUAUGCCAUUUUCCGCGAUUGCCUUGAUGAGUUAUAA--AUAUUAUUCAUAACAAAAAUAUUAAGGCACGCGGUGAAUGUCAUCCACGGsja-mir-124 AUGUAUGCCAUUUUCCGCGAUUGCCUUGAUUUGUUAAAAGAAAAUGAUUCACAACAAAA-UAUUAAGGCACGCGGUGAAUGUCAUCCACGGhsa-miR-124 ---------------------------------------------------------------UAAGGCACGCGGUGAAUGCC--------

mir-124 |-conserved antisense--| * ** * * dme-Struc ..(((((.(.......).))..))).((...(((((((((((((..(((((((((((.....)))))))))))...)))..)))).)).)))).)).dme-mir-287 GGACGCCGGGGAUGUAUGGG--UGUGUA--GGGUCUGAAAUUUUGCACACAUUUACAAUAAUUGUAAAUGUGUUGAAAAUCGUUUGCACGACUGUGAdme-miR-287 --------------------------------------------------------------------UGUGUUGAAAAUCGUUUGCAC--------sma-mir-287 ---GUAUACUCGUAUGGGUGAAUGUGUACA---UGUUAAAUUUUGCACACAUUUACAAAAAAAAGGUGCCGAAUAUUCCAUUUUCACCCUACAUGUUsma-Struc ...........(((.(((((((.(((...(...((((.......((((.(.(((......))).)))))..)))))..))).)))))))))).....

mir-287

sme-miR-749 ** ****** ********** Structure ...((((((((((((((((((((..(((.(.((((((.......)))))).))))....))))))...)))))).........))))))))..sja-mir-749 AAUCGCCAGGAUGAACCUCGGUGGUCCGGGGUGCAGGCUUCAAACCUGCAGCCGACUGGCGUCGGAGUGGUUCGAUUCCGCCUUCCUGGCGUGsma-mir-749 AAUUGCCGGGAUGAACCUCGGUGGUCCGGGGUGCAGGCUUCAAACCUGUAGCCGACUAGCAUCGGAGCGGUUCGAUUCCGCCUUCCUGGCGUAsme-mir-749-1 AAUCGCUGGGAUGAGCCUCGGUGGUCCGGGGUGCAGGCUUCAAACCUGUAGUCGGUUGACACCGAAGUGGUUCGAUUCCACCUUUCCAGCGAUsme-mir-749-2 AAUUGCUGGGAUGAGCCUCGGUGGUCCGGGGUGCAGGCUUCAAACCUGUAGUCGGUUGACACCGAAGUGGUUCGAUUCCACCUUUCCAGCGAUsme-miR-749 ----GCUGGGAUGAGCCUCGGUGGU--------------------------------------------------------------------

mir-749

Figure 3: Multiple sequence alignments of the pre-miRNAs that were computationally found in S. mansoni.For mir-124 and mir-749 the sequences share a common consensus structure. The uncertain mir-287 can-didate, that clusters together with mir-124 in the insect genomes, also shows a single stem-loop structurewhich, however is different from that of insects. Here the sequence is only conserved at the antisense regionof the annotated mature miRNA.

Using bioinformatics (see methods) we were ableto find only one further miRNA candidate in S. man-

soni, mir-124, that is also conserved in S. japon-

icum. In insects this miRNA is clustered with mir-

287. The distance of both miRNAs is approximately8kb in Drosophilids. We found an uncertain mir-

287 candiate in S. mansoni, however, on a differ-ent scaffold than mir-124. Although this sequencenicely folds into a single stem-loop structure, it isconserved only antisense to the annotated maturesequence in insects (see, Figure 3). This S. mansoni

mir-287 candidate seems not to be conserved in S.

japonicum.

In [52], 71 microRNAs are described for the dis-tantly related trematode Schmidtea mediterranea,and additional ones are announced in a recent studyfocussing on piRNAs [53]. The overwhelming ma-jority, 54, were reported to be members of 29 widelyconserved metazoan microRNA families, although insome cases even the mature miRNA sequence is quitediverged. Therefore, we regard several family assign-

ments as tentative at best. Of those 29 miRNAs,we found mir-124 only. Though, the schistosomesequences are more related to the other bilaterianmir-124 homologs than to those of S. mediterranea.Out of the remaining 54 miRNAs that were anno-tated in S. mediterranea we found that mir-749 isalso conserved in the two schistosome species. Here,the sequences show a common consensus sequenceand secondary structure in their precursors (see Fig-ure 3).

The small number of recognizable microRNAs inschistosomes is in strong constrast to the extensivemicroRNA complement in S. mediterranea, indicat-ing massive loss of microRNAs relative to the pla-narian ancestor. This may be a consequence of theparasitic lifestyle of the schistosomes.

Small Nucleolar RNAs

Small nucleolar RNAs play essential roles in the pro-cessing and modification of rRNAs in the nucleolus

6

Page 7: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

Table 1: Conservation and target prediction of snoRNA candidates. Only ribosomal RNAs were searchedfor putative target sites.

snoReport Box C/D (snoscan) Box H/ACA (RNAsnoop)targets ≥ 2 1 0 ≥ 2 1 0

predicted in S. mansoni 926 110 613 284 495 177conserved in S. japonicum 200 27 83 149 203 62

[54,55]. Both major classes, the box H/ACA and thebox C/D snoRNAs are relatively poorly conserved atthe sequence level and hence are difficult to detectin genomic sequences. This has also been observedin a recent ncRNA annotation project of the Tri-

choplax adhaerens genome [8]. The best-conservedsnoRNA is the atypical U3 snoRNA, which is essen-tial for processing of the 18S rRNA transcript intomature 18S rRNA [56]. In the current assembly ofthe S. mansoni genome we find six U3 loci, but theyare also identical in the flanking sequences, suggest-ing that in fact there is only a single U3 gene. Nounambiguous homologue was detected for any of theother known snoRNAs.

A de novo search for snoRNAs (see methods fordetails) resulted in 2610 promising candidates (1654box C/D and 956 box H/ACA), listed in the Elec-tronic Supplement. All these predictions exhibithighly conserved sequence boxes as well as the typi-cal secondary features of box C/D and box H/ACAsnoRNAs, respectively.

A comparison of the predicted snoRNAs with theentries in the Rfam [57] and Noncode [58] databasesreturned only 47 hits that match to several otherRNAs like tRNAs, parts of the rRNA operon, snR-NAs, mRNAlike genes and a few of our candidatesmap to the hammerhead ribozyme. These sequencesare likely false positives and have been removed fromthe candidate list. The number of predicted candi-dates is much larger than the number of snoRNAsreported in other organisms; for instance [55] lists456 for the human genome. Although we most likelydo not yet know the full snoRNA complement of eu-karyotic genomes, we have to expect that a largefraction of prediction will turn out to be false posi-tives.

We therefore analysed the conservation of thecandidates in S. japonicum and focussed on thesnoRNA candidates with targets in the 18S, 28Sand/or 5.8S ribosomal RNA. While targets are pre-dicted for more than half of the candidates, see Ta-

ble 1, the numbers are drastically reduced when con-servation of the candidates in S. japonicum is re-quired. Note, furthermore, that the fraction of con-served candidates is strongly enriched among thosewith ribosomal RNA targets, indicating that thesesets are likely to contain a sizeable fraction of truepositives. This filtering step leaves us with 227box C/D and 352 box H/ACA snoRNA candidates.While still high, these numbers fall into the expectedrange for a metazaon snoRNA complement.

We remark, finally, that five of the snoRNA can-didates (3 box C/D and 2 box H/ACA) are also con-served Schmidtea mediterranea.

Other RNA Motifs

Two examples of relatively well-known schistosomenon-coding RNA are the hammerhead ribozyme mo-tifs within the Sm-α and Sj-α SINE-like elements[24, 25]. A blastn search of the hammerhead ri-bozyme motif from the Rfam database resulted in24,447 candidates. While high, this number is notsurprising considering the generally high copy num-ber of SINE elements; previously, the copy numberfor Sm-α elements in the S. mansoni genome was es-timated to exceed 10,000 [24]. The potassium chan-nel RNA editing signal is another structured RNAelement that was decribed previously [59]. We foundthree copies of the gene for this signal in the S. man-

soni genome assembly.

Uncertain Candidates

Both the MRP RNA [2,3,61] and the 7SK RNA [4,5]have highly variable, rapidly evolving sequences thatmake them difficult or impossible to detect in inver-tebrate genomes. It is not surprisingly, therefore, wewere not able to identify unambiguous homologs inSchistosoma mansoni.

MRP RNA has multiple functions, among themmitochondrial RNA processing and nucleolar pre-

7

Page 8: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

Table 2: Non-coding RNA predictions from the sequenced genome of S. mansoni.

RNA class Functional Category Copy No. Related reference(s)

7SK Transcription regulation (1) This studyHammerhead ribozymes Self-cleaving > 24, 000 [24]miRNA translation control 4 [60], this studypotassium channel motif RNA editing 3 [59]RNase MRP Mitochondrial tRNA processing (1) This studyRNase P tRNA processing 1 This studyrRNA-operon Polypeptide synthesis 80 - 105 [35], this study5S rRNA Polypeptide synthesis 21 This studySL RNA Trans-splicing 6-48 [23], this studySnoRNA U3 Nucleolar rRNA processing 1 This studySRP Protein transportation 12 This studytRNA Polypeptide synthesis 663 This studyU1 Splicing 3-34 [40], this studyU2 Splicing 3-15 [40], this studyU4 Splicing 1-19 [40], this studyU5 Splicing 2-9 [40], this studyU6 Splicing 9-55 [40], this studyU11 Splicing 1 This studyU12 Splicing 1-2 [40], this studyU4atac Splicing 1 This studyU6atac Splicing 1 This study

rRNA processing. The S. mansoni MRP candi-date fits the general secondary structure model ofmetazoan MRP RNAs [2, 3, 61] and analysis withRNAduplex shows that the candidate contains apseudoknot which exhibited striking sequence iden-tity with known MRPs. On the other hand, stems 1and 12 were divergent compared with known MRPs,and stem 19 also failed to display clear similarity tothose of known MRPs. We therefore consider thissequence only tentative.

7SK RNA is a general transcriptional regulator,repressing transcript elongation through inhibitionof transcription elongation factor PTEFb and alsosuppresses the deaminase activity of APOBEC3C[62]. The S. mansoni 7SK candidate has a 5’ stemsimilar to that described in other invertebrates [5],and parts of the middle of the sequence are also rec-ognizable. However, the 3’ stem (which was followedby a poly-T terminator) was not conserved. In ad-dition, a large sequence deletion was evident.

Unexpectedly, no candidate sequence was foundfor a telomerase RNA. S. mansoni almost cer-tainly has a canonical telomerase holoenzyme, since

it encodes telomerase proteins (Smp 066300 andSmp 066290) and has the same telomeric repeatsequences as many other metazoan animals [63].Telomerase RNAs are notoriously difficult to find,as they are highly divergent among different species,varying in both size and sequence composition [64].

ConclusionsWe have described here a detailed annotation of“housekeeping” ncRNAs in the genome of the par-asitic planarian Schistosoma mansoni. Limited tothe best conserved structured RNAs, our work nev-ertheless uncovered important genomic features suchas the existence of a schistosome-specific SINE fam-ily derived from tRNA-Gln-TTG. Our data further-more establish the presence of a minor spliceosomein schistosomes and confirms spliced-leader trans-splicing.

Platyhelminths are known to be a fast-evolvingphylum [65]. It is not surprising therefore that inparticular the small ncRNAs are hard or impossibleto detect by simple homology search tools such as

8

Page 9: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

blastn. Even specialized tools have been success-full in identifying only the better conserved genessuch as tRNA, microRNAs, RNAse P RNA, SRPRNA. Notoriously poorly conserved families, suchas snoRNAs, mostly escaped detection.

The description of several novel and in manycase quite derived ncRNAs contributes significantlyto the understanding of the evolution of theseRNA families. The schistosome ncRNA sequences,furthermore, are an important input to subse-quent homology search projects, since they al-low the construction of improved descriptors forsequence/structure-based search algorithms. Lastbut not least, the ncRNA annotation track is an im-portant contribution to the genome-wide annotationdataset. It not only completes the protein-based an-notation but also helps to identify annotation errors,e.g. cases where putative proteins are annotated thatoverlap rRNA operons or other ncRNAs.

Methods

tRNA Annotation

We used tRNAscan-SE [66] with default parametersto annotate putative tRNA genes. As additionalconfirmation, the genome sequence was searchedusing tRNA consensus sequences from the Rfam

database [57]. In order to obtain suitable datafor comparison, the genome of the free-living platy-helminth Schmidtea mediterranea [15] was searchedalongside that of S. mansoni.

microRNA Annotation

We followed the general protocol outlined in [8]to identify miRNA precursors, using all metazoanmiRNAs listed in miRBase [67] [Release 11.0, http://microrna.sanger.ac.uk/sequences/]. The initialsearch was conducted by blastn with E < 0.01with the mature and mature* miRNAs as querysequences. The resulting candidates were then ex-tended to the length of the precursor sequence ofthe search query and aligned to the precursors us-ing ClustalW [68]. Secondary structures were pre-dicted using RNAfold [69] for single sequences andRNAalifold [70] for alignments. Candidates thatdid not fold into miRNA-like hairpin structures werediscarded. The remaining sequences were then ex-amined by eye to see if the mature miRNA waswell-positioned in the stem portion of each puta-

tive precursor sequence. In addition, we used thefinal candidates to search the S. japonicum and S.

mediterranea genomes to examine whether these se-quences are conserved in Schistosoma and/or Platy-helminthes.

snoRNA Annotation

We compared all the known human and yeast snoR-NAs that are annotated in the snoRNAbase [71] tothe S. mansoni genome using NCBI-blast [72] andGotohscan [8]. The search for novel snoRNA can-didates was performed only on sequences that werenot annotated as protein-coding or another ncRNAin the current S. mansoni assembly. The SnoReportprogram [73] was used to identify putative box C/Dand box H/ACA snoRNAs on both strands. Onlythe best predictions, i.e., those that show highly con-served boxes and canonical structural motifs, werekept for further analysis. The remaining candidatesare further analysed for possible target interactionswith ribosomal RNAs using snoscan [74] for boxC/D and RNAsnoop [75] for box H/ACA snoRNAcanidates. In addition, the sequences were checkedfor conservation in S. japonicum and S. mediter-

ranea using NCBIblast. To estimate the number offalse predictions we compared the candidate snoR-NAs with common ncRNA databases, in particu-lar Rfam [57] and noncode [58]. All sequences thatmatch a non-snoRNA ncRNA were discarded.

Other RNA families

For other families, we employed the following fivesteps:(a) Candidate sequences for ribosomal RNAs,spliceosomal RNAs, the spliced leader and the SRPRNA, we performed blast searches with E < 10−3

using the known ncRNA genes from the NCBI andRfam databases. For the snRNA set, see [40].For 7SL RNA we used X04249 , for 5S and 5.8SrRNAs we used the complete set of Rfam entries,for the SSU and LSU rRNAs, we used Z11976

and NR 003287 , respectively. The spliced-leaderSL RNAs were searched using SL-RNA entries fromRfam and the sequences reported in [23]. For morediverged genes such as minor snRNAs, RNase MRP,7SK, and RNase P, we used GotohScan [8], an imple-mentation of a full dynamic programming alignmentwith affine gap costs. In cases where no good can-didates where found we also employed descriptor-

9

Page 10: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

based search tools such as rnabob [http://selab.janelia.org/software.html].(b) In a second step, known and predicted sequenceswere aligned using ClustalW [68] and visualized withClustalX [76]. To identify functional secondarystructure, RNAfold, RNAalifold, and RNAcofold

[77] were used. Combined primary and secondarystructures were visualized using stockholm-formatalignment files in the emacs editor utilizing ralee

mode [78]. Alignments are provided in the Supple-mental Material.(c) Putatively functional sequences were distin-guished from likely pseudogenes by analysis of flank-ing genomic sequence. To this end, the flankingsequences of snRNA and SL RNA copies were ex-tracted and analyzed for conserved sequence ele-ments using meme [79]. Only snRNAs with plausiblepromoter regions were reported.(d) Additional consistency checks were employedfor individual RNA families, including phyloge-netic analysis by neighbor-joining [80] to checkthat candidate sequences fall at phylogeneti-cally reasonable positions relative to previouslyknown homologs. For RNAse MRP RNA can-didates, RNAduplex [http://www.tbi.univie.ac.at/RNA/RNAduplex.html] was used to find the pseu-doknot structure. In order to confirm that the SLRNA candidate is indeed transspliced to mRNAtranscripts, we searched the FAPESP Genoma

Schistosoma mansoni website http://bioinfo.iq.usp.br/schisto/ for ESTs including fragments of the pre-dicted SL RNA. We found 52 ESTs with blast

E < 0.001 that span the predicted region of theSL RNA (nt 8-38), indicating that this RNA doesindeed function as a spliced leader.(e) Accepted candidate sequences were used asblast queries against the S. mansoni genome to de-termine their copy number in the genome assembly.

Authors’ contributions

CSC, PB, and PFS designed the study. CSC, MM,DR, JH, CBS, SK, CSA, and PFS performed thecomputational analyses. CSC wrote the first draftof the manuscript. All authors contributed to thefinal assessment of the data as well as the writing ofthe final version of the manuscript. CSC, MM, DR,JH should be considered as joint first authors.

AcknowledgementsThis work was supported in part by the European Unionthrough grants in the 6th and 7th framework programeof the European Union (projects EMBIO, SYNLET, andEDEN), the Deutsche Forschungsgemeinschaft und theauspices of SPP SPP-1174 “Deep Metazoan Phylogeny”,the Freistaat Sachsen, and the DAAD-AleCol program.

References1. Amaral PP, Dinger ME, Mercer TR, Mattick JS: The

eukaryotic genome as an RNA machine. Science

2008, 319:1787–1789.

2. Piccinelli P, Rosenblad MA, Samuelsson T: Identifica-tion and analysis fo ribonuclease P and MRPRNA in a broad range of eukaryotes. Nucleic Acids

Res. 2005, 33:4485–4495.

3. Woodhams MD, Stadler PF, Penny D, Collins LJ:RNAse MRP and the RNA Processing Cascadein the Eukaryotic Ancestor. BMC Evol. Biol. 2007,7:S13.

4. Gruber AR, Koper-Emde D, Marz M, Tafer H, BernhartS, Obernosterer G, Mosig A, Hofacker IL, Stadler PF, Be-necke BJ: Invertebrate 7SK snRNAs. J. Mol. Evol.

2008, 107-115:66.

5. Gruber A, Kilgus C, Mosig A, Hofacker IL, Hennig W,Stadler PF: Arthropod 7SK RNA. Mol. Biol. Evol.

2008, 1923-1930:25.

6. Chen JL, Blasco MA, Greider CW: Secondary Struc-ture of Vertebrate telomerase RNA. Cell 2000,100:503–514.

7. Xie M, Mosig A, Qi X, Li Y, Stadler PF, Chen JJL:Size Variation and Structural Conservation ofVertebrate Telomerase RNA. J. Biol. Chem. 2008,283:2049–2059.

8. Hertel J, de Jong D, Marz M, Rose D, Tafer H, TanzerA, Schierwater B, Stadler PF: Non-Coding RNA An-notation of the Genome of Trichoplax adhaerens.Nucleic Acids Res. 2009, 37:1602–1615.

9. Blair D, Davis GM, Wu B: Evolutionary relationshipsbetween trematodes and snails emphasizing schis-tosomes and paragonimids. Parasitology 2001, 123Suppl:S229–S243.

10. Brant SV, Loker ES: Can specialized pathogens col-onize distantly related hosts? Schistosome evolu-tion as a case study. PLoS Pathog 2005, 1(3):167–169,[http://dx.doi.org/10.1371/journal.ppat.0010038].

11. Webster BL, Southgate VR, Littlewood DTJ: A revisionof the interrelationships of Schistosoma includingthe recently described Schistosoma guineensis. Int

J Parasitol 2006, 36(8):947–955, [http://dx.doi.org/10.1016/j.ijpara.2006.03.005].

12. Jimenez-Guri E, Philippe H, Okamura B, Holland PWH:Buddenbrockia is a cnidarian worm. Science 2007,317:116–118.

13. Wilson RA, Ashton PD, Braschi S, Dillon GP, BerrimanM, Ivens A: ’Oming in on schistosomes: prospectsand limitations for post-genomics. Trends Parasitol

2007, 23:14–20.

10

Page 11: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

14. Hirai H, Taguchi T, Saitoh Y, Kawanaka M, Sugiyama H,Habe S, Okamoto M, Hirata M, Shimada M, Tiu WU, LaiK, Upatham ES, Agatsuma T: Chromosomal differen-tiation of the Schistosoma japonicum complex. Int

J Parasitol 2000, 30:441–452.

15. Robb SMC, Ross E, Alvarado AS: SmedGD: theSchmidtea mediterranea genome database. Nucleic

Acids Res 2008, 36:D599–D606.

16. Hu W, Yan Q, Shen DK, Liu F, Zhu ZD, Song HD, XuXR, Wang ZJ, Rong YP, Zeng LC, Wu J, Zhang X, WangJJ, Xu XN, Wang SY, Fu G, Zhang XL, Wang ZQ, Brind-ley PJ, McManus DP, Xue CL, Feng Z, Chen Z, HanZG: Evolutionary and biomedical implications ofa Schistosoma japonicum complementary DNA re-source. Nat Genet. 2003, 35:139–147.

17. Verjovski-Almeida S, R D, Martins EA, Guimaraes PE,Ojopi EP, Paquola AC, Piazza JP, Nishiyama Jr MY,Kitajima JP, Adamson RE, Ashton PD, Bonaldo MF,Coulson PS, Dillon GP, Farias LP, Gregorio SP, Ho PL,Leite RA, Malaquias LC, Marques RC, Miyasato PA,Nascimento AL, Ohlweiler FP, Reis EM, Ribeiro MA, SaRG, Stukart GC, Soares MB, Gargioni C, Kawano T, Ro-drigues V, Madeira AM, Wilson RA, Menck CF, SetubalJC, Leite LC, Dias-Neto E: Transcriptome analysis ofthe acoelomate human parasite Schistosoma man-

soni . Nat. Genet. 2003, 35:148–157.

18. Schulmeister A, Heyers O, Morales ME, Brindley PJ,Lucius R, Meusel G, Kalinna BH: Organization andfunctional analysis of the Schistosoma mansoni

cathepsin D-like aspartic protease gene promoter.Biochim Biophys Acta 2005, 1727:27–34.

19. Copeland CS, Mann VH, Brindley PJ: Both sense andantisense strands of the LTR of the Schistosoma

mansoni Pao-like retrotransposon Sinbad driveluciferase expression. Mol. Genet. Genomics 2007,277:161–170.

20. Haas BJ, Berriman M, Hirai H, Cerqueira GG, LoverdePT, El-Sayed NM: Schistosoma mansoni genome:closing in on a final gene set. Exp Parasitol 2007,117:225–228.

21. Brejova B, Vinar T, Chen Y, Wang S, Zhoa G, BrownDG, Li M, Zhou Y: Finding genes in Schistosoma

japonicum: annotating novel genomes with helpof extrinsic evidence. Nucleic Acids Res. 2009, 37:e52.

22. Mourier T, Carret C, Kyes S, Christodoulou Z, Gard-ner PP, Jeffares DC, Pinches R, Barrell B, Berriman M,Griffiths-Jones S, Ivens A, Newbold C, Pain A: Genome-wide discovery and verification of novel struc-tured RNAs in Plasmodium falciparum . Genome

Res. 2008, 18:281–292.

23. Rajkovic A, Davis RE, Simonsen JN, Rottman FM: Aspliced leader is present on a subset of mRNAsfrom the human parasite Schistosoma mansoni .Proc Natl Acad Sci USA 1990, 87:8879–8883.

24. Ferbeyre G, Smith JM, Cedergren R: Schistosomesatellite DNA encodes active hammerhead ri-bozymes. Mol Cell Biol 1998, 18:3880–3888.

25. Laha T, McManus DP, Loukas A, Brindley PJ: Sjαelements, short interspersed element-like retro-posons bearing a hammerhead ribozyme motif

from the genome of the oriental blood flukeSchistosoma japonicum . Biochim Biophys Acta 2000,1492:477–482.

26. Copeland CS, Heyers O, Kalinna BH, Bachmair A,Stadler PF, Hofacker IL, Brindley PJ: Structural andevolutionary analysis of the transcribed sequenceof Boudicca, a Schistosoma mansoni retrotranspo-son. Gene 2004, 329:103–114.

27. Rollinson D, Kaukas A, Johnston DA, Simpson AJ,Tanaka M: Some molecular insights into schisto-some evolution. Int J Parasitol 1997, 27:11–28.

28. Littlewood DT, Lockyer AE, Webster BL, Johnston DA,Le TH: The complete mitochondrial genomes ofSchistosoma haematobium and Schistosoma spin-

dale and the evolutionary history of mitochondrialgenome changes among parasitic flatworms. Mol

Phylogenet Evol 2006, 39:452–467.

29. Sheppard K, Akochy PM, Soll D: Assays for transferRNA-dependent amino acid biosynthesis. Methods

2008, 44:139–145.

30. Ambrogelly A, Palioura S, Soll D: Natural expansionof the genetic code. Nat Chem Biol 2007, 3:29–35.

31. Hubert N, Walczak R, Sturchler C, Myslinski E, Schus-ter C, Westhof E, Carbon P, Krol A: RNAs mediatingcotranslational insertion of selenocysteine in eu-karyotic selenoproteins. Biochimie 1996, 78:590–596.

32. Coleman JR, Papamichail D, Skiena S, Futcher B, Wim-mer E, Mueller S: Virus attenuation by genome-scale changes in codon pair bias. Science 2008,320:1784–1787.

33. Smit AFA, Hubley R, Green P: RepeatMasker. [Ver-sion, open-3.2.5 [RMLib: 20080611], http://www.repeatmasker.org/].

34. Spotila LD, Hirai H, Rekosh DM, Lo Verde PT: Aretroposon-like short repetitive DNA element inthe genome of the human blood fluke, Schistosoma

mansoni . Chromosoma 1989, 97:421–428.

35. Simpson AJ, Dame JB, Lewis FA, McCutchan TF: Thearrangement of ribosomal RNA genes in Schis-

tosoma mansoni. Identification of polymorphicstructural variants. Eur J Biochem 1984, 139:41–45.

36. van Keulen H, Loverde PT, Bobek LA, Rekosh DM: Or-ganization of the ribosomal RNA genes in Schis-

tosoma mansoni . Mol Biochem Parasitol 1985, 15:215–230.

37. Nei M, Rooney AP: Concerted and birth-and-deathevolution of multigene families. Annu Rev Genet.

2005, 39:121–152.

38. Scheibye-Alsing K, Hoffmann S, Frankel AM, Jensen P,Stadler PF, Mang Y, Tommerup N, Gilchrist MJ, HilligABN, Cirera S, Jørgensen CB, Fredholm M, Gorodkin J:Sequence Assembly. Comp. Biol. Chem. 2009, 33:121–136.

39. Staley JP, Woolford Jr JL: Assembly of ribosomesand spliceosomes: complex ribonucleoprotein ma-chines. Curr Opin Cell Biol. 2009, 21:109–118.

40. Marz M, Kirsten T, Stadler PF: Evolution of Spliceo-somal snRNA Genes in Metazoan Animals. J. Mol.

Evol. 2008, 67:594–607.

11

Page 12: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

41. Kreivi JP, Lamond AI: RNA splicing: unexpectedspliceosome diversity. Curr Biol 1996, 6:802–805.

42. Pouchkina-Stantcheva NN, Tunnacliffe A: Splicedleader RNA-mediated trans-splicing in phylumRotifera. Mol Biol Evol 2005, 22:1482–1489.

43. Marz M, Vanzo N, Stadler PF: Carnival of SL RNAs:Structural variants and the possibility of a com-mon origin. RNA 2009. [Submitted].

44. McNair A, Zemzoumi K, Lutcke H, Guillerm C, BoitelleA, Capron A, Dissous C: Cloning of a signal-recognition-particle subunit of Schistosoma man-

soni . Parasitol Res 1995, 81:175–177.

45. Kirsebom LA: RNase P RNA mediated cleavage:substrate recognition and catalysis. Biochimie 2007,89:1183–1194.

46. Kikovska E, Svard SG, Kirsebom LA: EukaryoticRNase P RNA mediates cleavage in the absenceof protein. Proc. Natl. Acad. Sci. USA 2007, 104:2062–2067.

47. Williams AE: Functional aspects of animal microR-NAs. Cell Mol Life Sci 2008, 65:545–562.

48. Krautz-Peterson G, Skelly PJ: Schistosoma mansoni :the dicer gene and its expression. Exp. Parasitol.

2008, 118:122–128.

49. Gomes MS, Cabral FJ, Jannotti-Passos LK, Carvalho O,Rodrigues V, Baba EH, Sa RG: Preliminary analysisof miRNA pathway in Schistosoma mansoni . Par-

asitol Int. 2009, 58:61–68.

50. Liu F, Lu J, Hu W, Wang SY, Cui SJ, Chi M, YanQ, Wang XR, Song HD, Xu XN, Wang JJ, Zhang XL,Zhang X, Wang ZQ, Xue CL, Brindley PJ, McManus DP,Yang PY, Feng Z, Chen Z, Han ZG: New perspectiveson host-parasite interplay by comparative tran-scriptomic and proteomic analyses of Schistosoma

japonicum . PLoS Pathog 2006, 2:e29.

51. Xue X, Sun J, Zhang Q, Wang Z, Huang Y, Pan W: Iden-tification and characterization of novel microR-NAs from Schistosoma japonicum . PLoS ONE 2008,3:e4034.

52. Palakodeti D, Smielewska M, Graveley BR: MicroR-NAs from the Planarian Schmidtea mediterranea:a model system for stem cell biology. RNA 2006,12:1640–1649.

53. Palakodeti D, Smielewska M, Lu YC, Yeo GW, Grav-eley BR: The PIWI proteins SMEDWI-2 andSMEDWI-3 are required for stem cell functionand piRNA expression in planarians. RNA 2008,14:1174–1186.

54. Matera AG, Terns R, Terns: Non-coding RNAs:lessons from the small nuclear and small nucle-olar RNAs. Nat. Rev. Mol. Cell Biol. 2007, 8:209–220.

55. Dieci G, Preti M, Montanini B: Eukaryotic snoRNAs:A paradigm for gene expression flexibility. Ge-

nomics 2009.

56. Lukowiak AA, Granneman S, Mattox SA, SpeckmannWA, Jones K, Pluk WJ Hand Venrooij, Terns RM, TernsMP: Interaction of the U3-55k protein with U3snoRNA is mediated by the box B/C motif of U3and the WD repeats of U3-55k. Nucleic Acids Res

2000, 28:3462–3471.

57. Griffiths-Jones S, Moxon S, Marshall M, Khanna A,Eddy SR, Bateman A: Rfam: annotating non-codingRNAs in complete genomes. Nucleic Acids Res 2005,33:D121–D124.

58. Liu C, Bai B, Skogerbø G, Cai L, Deng W, Zhang Y, BuDB, Zhao Y, Chen R: NONCODE: an integrated knowl-edge database of non-coding RNAs. Nucleic Acids

Res. 2005, 33:D112–D115.

59. Kim E, Day TA, Bennett JL, Pax RA: Cloning andfunctional expression of a Shaker-related voltage-gated potassium channel gene from Schistosoma

mansoni (Trematoda: Digenea). Parasitology 1995,110:171–180.

60. Hertel J, Lindemeyer M, Missal K, Fried C, Tanzer A,Flamm C, Hofacker IL, Stadler PF, The Students ofBioinformatics Computer Labs 2004 and 2005: The Ex-pansion of the Metazoan MicroRNA Repertoire.BMC Genomics 2006, 7:15.

61. Lopez MD, Rosenblad MA, Samuelsson T: Conservedand variable domains of RNase MRP RNA. RNA

Biology 2009, 6. in press.

62. Barrandon C, Spiluttini B, Bensaude O: Non-codingRNAs regulating the transcriptional machinery.Biol Cell 2008, 100:83–95.

63. Hirai H, LoVerde PT: Identification of the telomereson Schistosoma mansoni chromosomes by FISH. J.

Parasitol. 1996, 82:511–512.

64. Theimer CA, Feigon J: Structure and function oftelomerase RNA. Curr Opin Struct Biol 2006, 16:307–318.

65. Lartillot N, Brinkmann H, Philippe H: Suppressionof long-branch attraction artefacts in the animalphylogeny using a site-heterogeneous model. BMC

Evolutionary Biology 2007, 7:S4.

66. Lowe T, Eddy S: tRNAscan-SE: a program for im-proved detection of transfer RNA genes in ge-nomic sequence. Nucl. Acids Res. 1997, 25:955–964.

67. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ:miRBase: tools for microRNA genomics. Nucleic

Acids Res. 2008, 36:D154–D158.

68. Thompson JD, Higgs DG, Gibson TJ: CLUSTALW:improving the sensitivity of progressive multiplesequence alignment through sequence weighting,position specific gap penalties, and weight matrixchoice. Nucl. Acids Res. 1994, 22:4673–4680.

69. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS,Tacker M, Schuster P: Fast Folding and Compari-son of RNA Secondary Structures. Monatsh. Chem.

1994, 125:167–188.

70. Hofacker IL, Fekete M, Stadler PF: Secondary Struc-ture Prediction for Aligned RNA Sequences. J.

Mol. Biol. 2002, 319:1059–1066.

71. Lestrade L, Weber MJ: snoRNA-LBME-db, a com-prehensive database of human H/ACA and C/Dbox snoRNAs. Nucleic Acids Res 2006, 34:D158–D162.

72. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ:Basic local alignment search tool. J Mol Biol 1990,215:403–10.

12

Page 13: Non-coding RNA Annotation of the Schistosoma mansoni …Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and de-fense, and disease [1]

73. Hertel J, Hofacker IL, Stadler PF: snoReport: Compu-tational identification of snoRNAs with unknowntargets. Bioinformatics 2008, 24:158–164.

74. Lowe TM, Eddy SR: A Computational Screen forMethylation Guide snoRNAs in Yeast. Science

1999, 283:1168–1171.

75. Tafer H, Kehr S, Hertel J, Stadler P: RNAsnoop: Efficienttarget prediction for box H/ACA snoRNAs 2009.[Submitted].

76. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F,Higgins DG: The CLUSTAL X windows interface:flexible strategies for multiple sequence alignmentaided by quality analysis tools. Nucleic Acids Res

1997, 25:4876–4882.

77. Bernhart SH, Tafer H, Muckstein U, Flamm C, StadlerPF, Hofacker IL: Partition Function and Base Pair-ing Probabilities of RNA Heterodimers. Algorithms

Mol. Biol. 2006, 1:3.

78. Griffiths-Jones S: RALEE—RNA ALignment editor inEmacs. Bioinformatics 2005, 21:257–259.

79. Bailey TL, Williams N, Misleh C, Li WW: MEME:discovering and analyzing DNA and protein se-quence motifs. Nucleic Acids Res. 2006, 34:W369–W373.

80. Saitou N, Nei M: The neighbor-joining method:a new method for reconstructing phylogenetictrees. Mol Biol. Evol. 1987, 4:406–425.

Additional FilesSupplemental figures and text are available as separate PDF at http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/08-014.

13