platyzoan paraphyly based on phylogenomic data supports …...platyhelminthes and gastrotricha...

17
Article Platyzoan Paraphyly Based on Phylogenomic Data Supports a Noncoelomate Ancestry of Spiralia Torsten H. Struck,* ,1,2 Alexandra R. Wey-Fabrizius, 3 Anja Golombek, 1 Lars Hering, 4 Anne Weigert, 5 Christoph Bleidorn, 5 Sabrina Klebow, 3 Nataliia Iakovenko, 6,7 Bernhard Hausdorf, 8 Malte Petersen, 1 Patrick Ku ¨ck, 1 Holger Herlyn, 9 and Thomas Hankeln 3 1 Zoological Research Museum Alexander Koenig, Bonn, Germany 2 University of Osnabru ¨ck, FB05 Biology/Chemistry, AG Zoology, Osnabru ¨ck, Germany 3 Institute of Molecular Genetics, Biosafety Research and Consulting, Johannes Gutenberg University, Mainz, Germany 4 Animal Evolution and Development, Institute of Biology II, University of Leipzig, Leipzig, Germany 5 Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany 6 Department of Biology and Ecology, Ostravian University in Ostrava, Ostrava, Czech Republic 7 Department of Invertebrate Fauna and Systematics, Schmalhausen Institute of Zoology NAS of Ukraine, Kyiv, Ukraine 8 Zoological Museum, University of Hamburg, Hamburg, Germany 9 Institute of Anthropology, Johannes Gutenberg University, Mainz, Germany *Corresponding author: E-mail: [email protected]. Associate editor: Gregory Wray Abstract Based on molecular data three major clades have been recognized within Bilateria: Deuterostomia, Ecdysozoa, and Spiralia. Within Spiralia, small-sized and simply organized animals such as flatworms, gastrotrichs, and gnathostomulids have recently been grouped together as Platyzoa. However, the representation of putative platyzoans was low in the respective molecular phylogenetic studies, in terms of both, taxon number and sequence data. Furthermore, increased substitution rates in platyzoan taxa raised the possibility that monophyletic Platyzoa represents an artifact due to long- branch attraction. In order to overcome such problems, we employed a phylogenomic approach, thereby substantially increasing 1) the number of sampled species within Platyzoa and 2) species-specific sequence coverage in data sets of up to 82,162 amino acid positions. Using established and new measures (long-branch score), we disentangled phylogenetic signal from misleading effects such as long-branch attraction. In doing so, our phylogenomic analyses did not recover a monophyletic origin of platyzoan taxa that, instead, appeared paraphyletic with respect to the other spiralians. Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa and all other spiralians represent a monophyletic group, which we name Platytrochozoa. Platyzoan paraphyly suggests that the last common ancestor of Spiralia was a simple-bodied organism lacking coelomic cavities, segmentation, and complex brain structures, and that more complex animals such as annelids evolved from such a simply organized ancestor. This conclusion contradicts alternative evolutionary scenarios proposing an annelid-like ancestor of Bilateria and Spiralia and several independent events of secondary reduction. Introduction Molecular data have profoundly changed the view of the bilaterian tree of life by recognizing three major clades: Deuterostomia, Ecdysozoa, and Spiralia (Halanych 2004; Edgecombe et al. 2011). The term Spiralia is occasionally used as a synonym for Lophotrochozoa (Halanych 2004). However, the term Lophotrochozoa is actually reserved for all descendants of the last common ancestor of Annelida, Mollusca, and the three lophophorate taxa (Halanych 2004), whereas the more comprehensive taxon Spiralia in- cludes all animals with spiral cleavage and, hence, also Platyhelminthes (Edgecombe et al. 2011). Herein, we use Spiralia in the terms of the more inclusive definition. Previous results of the molecular phylogenetic analyses initiated a still on-going debate about the evolution of com- plexity in Bilateria. It was proposed that the last common ancestor of Deuterostomia, Ecdysozoa and Spiralia had a seg- mented and coelomate body organization resembling that of an annelid, and that morphologically more simply organized taxa such as nematodes or flatworms (Platyhelminthes) evolved by secondary reductions (Brinkman and Philippe 2008; De Robertis 2008; Couso 2009; Tomer et al. 2010; Chesebro et al. 2013). This is in stark contrast to the tradi- tional “acoeloid–planuloid” hypothesis favoring evolution of Bilateria from a simple body organization toward more com- plex forms with a last common ancestor resembling a flat- worm without segmentation and coelomic cavities (Hyman 1951; Halanych 2004; Hejnol et al. 2009). Unraveling the phy- logenetic relationships within Bilateria is crucial to resolve this controversy (Halanych 2004; Edgecombe et al. 2011). While recent phylogenomic studies recovered most of the relations of the major branches within Deuterostomia and ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 31(7):1833–1849 doi:10.1093/molbev/msu143 Advance Access publication April 18, 2014 1833

Upload: others

Post on 14-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

Article

Platyzoan Paraphyly Based on Phylogenomic Data Supportsa Noncoelomate Ancestry of SpiraliaTorsten H Struck12 Alexandra R Wey-Fabrizius3 Anja Golombek1 Lars Hering4 Anne Weigert5

Christoph Bleidorn5 Sabrina Klebow3 Nataliia Iakovenko67 Bernhard Hausdorf8 Malte Petersen1

Patrick Kuck1 Holger Herlyn9 and Thomas Hankeln3

1Zoological Research Museum Alexander Koenig Bonn Germany2University of Osnabruck FB05 BiologyChemistry AG Zoology Osnabruck Germany3Institute of Molecular Genetics Biosafety Research and Consulting Johannes Gutenberg University Mainz Germany4Animal Evolution and Development Institute of Biology II University of Leipzig Leipzig Germany5Molecular Evolution and Systematics of Animals Institute of Biology University of Leipzig Leipzig Germany6Department of Biology and Ecology Ostravian University in Ostrava Ostrava Czech Republic7Department of Invertebrate Fauna and Systematics Schmalhausen Institute of Zoology NAS of Ukraine Kyiv Ukraine8Zoological Museum University of Hamburg Hamburg Germany9Institute of Anthropology Johannes Gutenberg University Mainz Germany

Corresponding author E-mail torstenstruckzfmkuni-bonnde

Associate editor Gregory Wray

Abstract

Based on molecular data three major clades have been recognized within Bilateria Deuterostomia Ecdysozoa andSpiralia Within Spiralia small-sized and simply organized animals such as flatworms gastrotrichs and gnathostomulidshave recently been grouped together as Platyzoa However the representation of putative platyzoans was low in therespective molecular phylogenetic studies in terms of both taxon number and sequence data Furthermore increasedsubstitution rates in platyzoan taxa raised the possibility that monophyletic Platyzoa represents an artifact due to long-branch attraction In order to overcome such problems we employed a phylogenomic approach thereby substantiallyincreasing 1) the number of sampled species within Platyzoa and 2) species-specific sequence coverage in data sets of upto 82162 amino acid positions Using established and new measures (long-branch score) we disentangled phylogeneticsignal from misleading effects such as long-branch attraction In doing so our phylogenomic analyses did not recover amonophyletic origin of platyzoan taxa that instead appeared paraphyletic with respect to the other spiraliansPlatyhelminthes and Gastrotricha formed a monophylum which we name Rouphozoa To the exclusion ofGnathifera Rouphozoa and all other spiralians represent a monophyletic group which we name PlatytrochozoaPlatyzoan paraphyly suggests that the last common ancestor of Spiralia was a simple-bodied organism lacking coelomiccavities segmentation and complex brain structures and that more complex animals such as annelids evolved from sucha simply organized ancestor This conclusion contradicts alternative evolutionary scenarios proposing an annelid-likeancestor of Bilateria and Spiralia and several independent events of secondary reduction

IntroductionMolecular data have profoundly changed the view of thebilaterian tree of life by recognizing three major cladesDeuterostomia Ecdysozoa and Spiralia (Halanych 2004Edgecombe et al 2011) The term Spiralia is occasionallyused as a synonym for Lophotrochozoa (Halanych 2004)However the term Lophotrochozoa is actually reserved forall descendants of the last common ancestor of AnnelidaMollusca and the three lophophorate taxa (Halanych2004) whereas the more comprehensive taxon Spiralia in-cludes all animals with spiral cleavage and hence alsoPlatyhelminthes (Edgecombe et al 2011) Herein we useSpiralia in the terms of the more inclusive definition

Previous results of the molecular phylogenetic analysesinitiated a still on-going debate about the evolution of com-plexity in Bilateria It was proposed that the last common

ancestor of Deuterostomia Ecdysozoa and Spiralia had a seg-mented and coelomate body organization resembling that ofan annelid and that morphologically more simply organizedtaxa such as nematodes or flatworms (Platyhelminthes)evolved by secondary reductions (Brinkman and Philippe2008 De Robertis 2008 Couso 2009 Tomer et al 2010Chesebro et al 2013) This is in stark contrast to the tradi-tional ldquoacoeloidndashplanuloidrdquo hypothesis favoring evolution ofBilateria from a simple body organization toward more com-plex forms with a last common ancestor resembling a flat-worm without segmentation and coelomic cavities (Hyman1951 Halanych 2004 Hejnol et al 2009) Unraveling the phy-logenetic relationships within Bilateria is crucial to resolve thiscontroversy (Halanych 2004 Edgecombe et al 2011)

While recent phylogenomic studies recovered most of therelations of the major branches within Deuterostomia and

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom

Mol Biol Evol 31(7)1833ndash1849 doi101093molbevmsu143 Advance Access publication April 18 2014 1833

Ecdysozoa the internal phylogeny of Spiralia is still unclear(Edgecombe et al 2011) Indeed spiralian animals exhibit awide variety and plasticity in development and morphologyincluding body organization (Nielsen 2012) which gave rise tothe distinction of two major taxa Lophotrochozoa andPlatyzoa (Halanych 2004 Edgecombe et al 2011) As men-tioned above Lophotrochozoa comprises at least annelids(ringed worms) lophophorates and mollusks (Halanych2004) and hence animals with a more complex morphologyIn contrast Platyzoa subsumes more simple appearing taxasuch as flatworms hairy backs (Gastrotricha) wheel animals(classical Rotifera) thorny-headed worms (Acanthocephala)and jaw worms (Gnathostomulida) (Cavalier-Smith 1998)Although some authors regard Platyzoa as sister toLophotrochozoa (Edgecombe et al 2011) others placePlatyzoa within Lophotrochozoa thus rendering Spiraliasynonymous with Lophotrochozoa (Halanych 2004)Importantly unique morphological autapomorphies sup-porting the monophyly of Platyzoa are lacking (Giribet2008) and phylogenetic analyses of nuclear and mitochondrialdata failed to resolve the question as well (Paps et al 2009a2009b Bernt et al 2013) Nevertheless there seems to be atendency for a weakly supported monophylum Platyzoa aslong as larger data sets were analyzed (Halanych 2004Hausdorf et al 2007 Struck and Fisse 2008 Hejnol et al2009 Paps et al 2009a Witek et al 2009) However acrossall these analyses placement of platyzoan taxa appearedunstable probably due to low data and taxa coverage(Edgecombe et al 2011) Moreover parallel evolution of char-acter states on long branches (also known as LBA) might alsohave confounded these analyses (Edgecombe et al 2011) Insummary monophyly of Platyzoa and the phylogenetic posi-tions of the platyzoan taxa within Spiralia are still contentiousalthough their positions have major implications for bilaterianevolution In particular monophyly of Platyzoa and a place-ment within Lophotrochozoa would be in line with thetheory of a more complex ancestry (Brinkman and Philippe2008) whereas paraphyletic Platyzoa with respect toLophotrochozoa would support the ldquoacoeloidndashplanuloidrdquohypothesis

Results and DiscussionTo address the major outstanding issues of bilaterian phylog-eny with respect to spiralian and more specifically platyzoanrelationships we applied a phylogenomic approach generat-ing transcriptome sequence data for 10 putative platyzoanand two nemertean species using second-generation se-quencing technology and a modified RNA amplificationmethod which allowed the generation of sequencing librariesfrom as few as 10 specimens of microscopic species ofGnathostomulida Gastrotricha and classical Rotifera (supple-mentary table S1 Supplementary Material online) These datawere complemented with transcriptomic or genomic data of53 other spiralian and ecdysozoan species including addi-tional representatives of Platyzoa (supplementary table S2Supplementary Material online) Hereby the taxon cover-age of Platyzoa increased 35-fold and for individual platyzoantaxa such as Syndermata (wheel animals and thorny-headed

worms) and Gastrotricha even 5-fold in comparison to pre-vious large-scale analyses of spiralian relationships (Dunnet al 2008 Hejnol et al 2009) After orthology assignment(Ebersberger et al 2009) the data were further screened forsequence redundancy (Kvist and Siddall 2013) potentiallyparalogous sequences (Struck 2013) and contamination(Struck 2013) resulting in a pruning of about 7 of sequencedata (supplementary tables S3ndashS8 Supplementary Materialonline)

Brute-Force Approach More Taxa and Data

Phylogenetic reconstructions based on the largest data setsd01 with 82162 amino acid positions and 383 sequencecoverage (supplementary table S9 Supplementary Materialonline) recovered monophyly of both Platyzoa andLophotrochozoa with strong bootstrap support (BS) of99 for both (fig 1) Within Platyzoa monophylyof Platyhelminthes of Syndermata as well as ofGnathostomulida was maximally supported whereas mono-phyly of Gastrotricha was not recovered The chaetonotidgastrotrich Lepidodermella squamata appeared as sister toPlatyhelminthes (BS 46) whereas the macrodasyidan gastro-trichs formed a monophylum (BS 74) as sister to all otherplatyzoan taxa (BS 68) Finally Gnathostomulida was sisterto Syndermata (BS 61) consistent with the Gnathiferahypothesis (Ahlrichs 1997 Herlyn and Ehlers 1997)

To study the influence of unstable taxa leaf stability anal-yses were performed With a leaf stability index of 0876 thegastrotrich Lep squamata was the most unstable specieswithin the sampled platyzoans followed by the two gnathos-tomulid species (0941) and the macrodasyidan gastrotrichDactylopodola baltica (0969) (fig 2 and supplementarytable S10 Supplementary Material online) Excluding thesefour platyzoan taxa from data set d01 and conducting anew phylogenetic reconstruction did not influence theremaining topology but led to an increased BS value of99 for a clade uniting Platyhelminthes and Syndermata anddecreased values for the monophyly of both Platyzoa andLophotrochozoa (BS 82 and 86 table 1) Thus the four un-stable taxa showed some influence on support for the phy-logenetic placement of other platyzoan taxa Therefore weexcluded these four taxa from the following analyses whichaddressed the potential role of LBA on platyzoan phylogenyin more detail

LBA Accounts for Monophyly of Platyzoa

Monophyletic Platyzoa as sister to Lophotrochozoa gainedstrong support in the analyses described above Howeverthorough inspection of the topology (fig 1) revealed consid-erable branch length heterogeneity with long branches in theanalyzed platyzoan lineages and rather short branches inlophotrochozoan and ecdysozoan lineages Hence the ob-served strong support for monophyletic Platyzoa might orig-inate from artificial rather than phylogenetic signal (Bergsten2005 Edgecombe et al 2011 Kuck et al 2012) For the treederived from data set d01 and shown in figure 1 the LB scoresshowed a bimodal distribution with a minimum between the

1834

Struck et al doi101093molbevmsu143 MBE

two highest optima at an LB score value of 0 (fig 2B) Putativeplatyzoan species had generally higher LB score values thanlophotrochozoan and ecdysozoan species (fig 2 and supple-mentary table S11 Supplementary Material online) Only theLB scores inferred for Stylochoplana and Paraplanocera withinPlatyhelminthes the two Brachionus species and Lecane inSyndermata and Megadasys and Macrodasys in Gastrotrichaapproximated those of most lophotrochozoans and ecdy-sozoans On the other hand Symbion (Cycliophora)Alcyonidium and Tubulipora (Ectoprocta) showed valuesgt0 resembling those of most of the platyzoan species sam-pled (fig 2)

To assess the effect of long branches on tree reconstruc-tion all species with LB scores above 0 were excluded fromdata set d01 (82162 positions) Interestingly monophyly ofPlatyzoa was no longer recovered (fig 3) Gastrotricha nowemerged as sister to Platyhelminthes (BS 96 table 1) and this

clade was sister to monophyletic Lophotrochozoa (BS 95table 1) whereas Syndermata was sister to all other spiraliantaxa Thus exclusion of long-branched species had a tremen-dous effect on the analyses rendering a strongly supportedmonophyly of Platyzoa with BS values gt95 into aparaphyletic assemblage in which the clade consistingof Gastrotricha + Platyhelminthes and Lophotrochozoaobtained strong support with a BS value of 95

Biases Causing Monophyletic Platyzoa

To gain further insights into the issue of mono- versus para-phyletic Platyzoa we analyzed the data with respect to thedifferent properties of individual genes In detail we studiedthe effect of gene-specific proportions of hydrophobic aminoacids and missing data base composition and branch lengthheterogeneity and evolutionary rates on tree reconstructionA common procedure is to choose one of these properties as

02

Aplysia

Taenia

Echinococcus

Cephalothrix

Philodina

Capitella

Chaetopleura

Alvinella

PriapulusEchinoderes

Rotaria

Terebratalia

Macracanthorhynchus

Schmidtea

Opisthorchis

Lingula

Pedicellina

Symbion

Apis

StylochoplanaParaplanocera

Paratenuisentis

Megadasys

Spirometra

Echinorhynchus

Euprymna

Moniezia

Biomphalaria

Nematoplana

Brachionus

Daphnia

Fasciola

Echinococcus

Macrodasys

Lumbricus

Idiosepius

FlustraBugula

Tubifex

Seison

Clonorchis

Lecane

Lepidodermella

Tubulipora

Mytilus

Brachionus

Dactylopodola

Neomenia

Helobdella

Alcyonidium

Gnathostomula

Pomphorhynchus

Turbanella

Dugesia

Gnathostomula

Dugesia

Pedicellina

Macrostomum

Cristatella

Crassostrea

Tubulanus

Adineta

SchistosomaSchistosoma

Lottia

73

68

90

61

90

99

93

99

92

74

99

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Platyzoa

Gastrotricha

62

FIG 1 Maximum-likelihood (ML) tree obtained by analysis of data set d01 with 65 taxa and 82162 amino acid positions Only BS values50 are shownat the branches Maximal support of 100 Higher taxonomic units are indicated

1835

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the most influential one either based a priori on literature or aposteriori on the obtained results (eg Brinkman and Philippe2008 Simmons 2012b Nesnidal et al 2013 Nosenko et al2013 Roure et al 2013 Salichos and Rokas 2013) Herein weused another procedure based on the variability exhibited inthe data itself prior to analyses of alternative data sets reflect-ing different degrees of data reduction According to the prin-cipal component analysis (Alexe et al 2008) the first principalcomponent explained 310 of the variance between thedifferent genes It was mainly derived from the proportionof missing data and base composition heterogeneity witheigenvectors pointing into opposite directions (supplemen-tary fig S3 and table S12 Supplementary Material online)Branch length heterogeneity and evolutionary rate were thelargest factors in the second component which explained268 of the variance Correlation analyses showed that inour case evolutionary rate which is often used as a proxy for

branch length heterogeneity (Brinkman and Philippe 2008)did not correlate with actual measurements of branch lengthheterogeneity (R2 = 00324 and 00635 supplementary fig S4Supplementary Material online)

As we wanted to test for LBA we used the direct measure-ment of branch length heterogeneity instead of evolutionaryrate Thus we generated data sets with either differentdegrees of missing data (d02ndashd06) proportion of low basecomposition heterogeneity (d07) or low branch lengthheterogeneity (d08) as well as genes being part of the 70or 95 confidence intervals of the first two principal compo-nents (d09 and d10) (supplementary tables S9 and S13Supplementary Material online) Based on the results of theprincipal component analysis we present in detail the resultsof three data sets d07 (low base composition heterogeneity)d08 (low branch length heterogeneity) and d02 The lattercombines a low degree of missing data with a high number ofpositions

Analyses of these three data sets excluding the four above-mentioned unstable platyzoan taxa consistently resulted inparaphyletic Platyzoa (fig 4 and table 1) as observed beforewhen excluding long-branched taxa from the large data setd01 Once more Platyhelminthes was sister to Gastrotricha(BS 76 84 and 71 Rouphozoa in table 1) and Lophotrochozoawas recovered as a monophyletic group (BS 98 47 and 95)The clade of GastrotrichaPlatyhelminthes was sister toLophotrochozoa (BS 72 86 and 75 table 1) andSyndermata was sister to the all other spiralian taxa againThus either by increasing the coverage (d02) or decreasingbase composition or branch length heterogeneity (d07and d08) paraphyletic Platyzoa was recovered (table 1) asa clade comprising Gastrotricha Platyhelminthes andLophotrochozoa gained strong branch support exceedingvalues of 70

Additional exclusion of long-branched species (figs 2and 3) reproduced paraphyly of Platyzoa in all analyseseven with maximum BS in some analyses AgainPlatyhelminthes was sister to Gastrotricha (BS 93 97 and54 fig 5 Rouphozoa in table 1) and both were more closelyrelated to the lophotrochozoan taxa than to Syndermata(BS 100 100 and 50 fig 5 Platyzoa Para in table 1)

Table 1 Bootstrap Support (BS) for Monophyly and Paraphyly of Platyzoa as well as Monophyly of Rouphozoa

Data Set Excl Taxa Pos Taxa Platyzoa Rouphozoa

Mono Para Mono

d01 (all data) None 82162 65 99a 0 3Unstable 82162 61 82b 1 1LB 82162 34 3 95a 96a

d02 (high coverage) Unstable 36513 61 3 86b 84b

LB 36513 34 0 100a 93b

d07 (low base frequency heterogeneity) Unstable 37907 61 19 75b 71b

LB 37907 34 0 100a 97a

d08 (low branch length heterogeneity) Unstable 29133 61 18 72b 76b

LB 29133 34 10 50 54

Excl excluded pos number of positions taxa number of taxa LB long-branched taxaaSupport values are part of the 95 confidence setbSupport values are part of the 70 confidence set

Nemert

ea

09

10

-30 0 60

Lea

f st

abili

ty in

dex

LB score

Ecdysozoa

Mollusca

Brachiopoda

PlatyhelminthesStPa

Br Le

SyndermataGastrotricha

Ectopr

octa

EntoproctaCycliophora

Gnathostomulida

Da

Lep

AnnelidaMe Ma

Sy

Tu Al

005

015

025

-25 0 50

freq

uenc

y

LB score B

A

FIG 2 Leaf stability indices and LB scores based on the ML analysis ofdata set d01 with 65 taxa and 82162 amino acid positions (A) Plot ofleaf stability indices against LB scores (B) Distribution of LB scores Thedashed line in A indicates LB score = 0 Shades distinguish ecdysozoanand lophotrochozoan species (light gray) from putative platyzoan spe-cies (dark gray) Some species are also labeled Pa Paraplanocera StStylochoplana Br Brachionus Le Lecane Me Megadasys MaMacrodasys Da Dactylopodola Lep Lepidodermella Sy Symbion TuTubulipora Al Alcyonidium

1836

Struck et al doi101093molbevmsu143 MBE

Moreover comparing the trees without long-branched spe-cies (figs 3 and 5) to the one with all species (fig 1) shows thatnow similar branch lengths lead to the ldquoplatyzoanrdquo and lopho-trochozoan species (figs 3 and 5) Additionally the standarddeviation of the species-specific LB scores for the trees shownin figures 3 and 5 are 103 and 93 respectively andhence lower than the standard deviation of 154 for thetree of figure 1 This means that the latter exhibits muchstronger branch length heterogeneity across all taxa thanthe former two Similarly the standard deviations for theclassical tip-to-root distances are lower for the trees offigures 3 and 5 with 0054 and 0143 than for the tree offigure 1 with 0202

We also used a Bayesian approach with the GTR + CATmodel as this is known to be more robust toward LBA thanclassical ML models such as LG (Lartillot et al 2007) Due tocomputational time restrictions and high memory require-ments we were not able to use the large data set d01 (82162positions) Instead we chose data set d02 (low to medium-low degree of missing data 36513 positions 461 coveragesupplementary table S9 Supplementary Material online)as the principal component analysis indicated coverageas the most influential property in the firstcomponent Importantly the Bayesian approach did notrecover monophyletic Platyzoa but instead a cladeincluding Gastrotricha + Platyhelminthes and monophyleticLophotrochozoa (posterior probability [PP] = 100 fig 6) andagain Gnathostomulida + Syndermata was sister to thisclade (PP = 100 fig6)

Thus combining Bayesian and maximum-likelihood anal-yses with different data and taxa exclusion strategies couldnot recover monophyletic Platyzoa in contrast to analysesusing only large numbers of data (figs 1 3ndash6 and table 1)Considering all 10 data sets (ie d01ndashd10) BS for monophy-letic Platyzoa substantially increased with additional aminoacid positions (dark gray line in fig 7A) whereas support forparaphyly decreased (black line in fig 7A) In contrast supportfor monophyly of Lophotrochozoa was not strongly affectedby the number of positions analyzed (light gray line in fig 7A)It is a well-known phenomenon of LBA that it is positivelymisleading that is with increasing numbers of positions theartificial group is more robustly recovered (Felsenstein 1978Huelsenbeck 1997 Bergsten 2005) On the other hand ex-cluding long-branched species from analyses did not lead tosuch correlations In particular support for the monophyly ofPlatyzoa remained low irrespective of the number of align-ment positions (dark gray line in fig 7B)

Additionally we determined for each data set the numberof single-gene trees supporting monophyly or paraphyly ofPlatyzoa Across all data sets the percentage of single genessupporting platyzoan paraphyly ranged from 86 to 117and thus was higher than the percentage supporting mono-phyletic Platyzoa ranging from 05 to 32 (table 2)Interestingly decreasing the degree of missing data (ied01ndashd06) and hence increasing the number of taxa pergene the ratio of the percentage of trees supportingparaphyly relative to the percentage of trees supportingmonophyly strongly increased (black line in fig 7C)

02

Pedicellina

Neomenia

Terebratalia

Crassostrea

Bugula

Brachionus

Aplysia

Euprymna

Priapulus

Brachionus

MacrodasysMegadasys

Tubulanus

Cristatella

Stylochoplana

Capitella

Pedicellina

Lottia

Tubifex

Paraplanocera

Echinoderes

Flustra

Mytilus

Alvinella

Lingula

Biomphalaria

Apis

Chaetopleura

Helobdella

Cephalothrix

Idiosepius

Lumbricus

Daphnia

Lecane63

60

9973

96

95

76 73

86

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

FIG 3 ML tree obtained by analysis of data set d01 with 34 taxa and 82162 amino acid positions All taxa exceeding LB scores gt0 in tree of figure 1were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1837

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Directly addressing biases in the data such as base or branchlength heterogeneity did not have such an effect on the ratioIn the case of LBA only strategies as used herein which areable to attenuate its misleading effect by excluding eitherbiased data or species or by increasing taxon coverage pergene can reveal whether or not an assembly of long-branchedtaxa is artificially grouped together (Bergsten 2005) In con-clusion our analyses support platyzoan paraphyly whereasrecovery of monophyletic ldquoPlatyzoardquo is most probably dueto LBA

Position of Gnathostomulida

In addition to LBA the inference of a stable topology washampered by the inclusion of Gnathostomulida and the twogastrotrichs Lepidodermella and Dactylopodola In order toelucidate the phylogenetic position of Gnathostomulidawithin Spiralia we reincluded the two formerly excludedgnathostomulid species into different data sets Importantlytheir inclusion did not alter the topology with respect toplatyzoan paraphyly in any tree reconstruction (eg cf

figs 4 and 8) Analysis of data set d07 excluding unstabletaxa except Gnathostomulida (ie Lepidodermella andDactylopodola) and of data set d02 excluding all long-branched taxa recovered Gnathostomulida as part of aclade with Gastrotricha and Platyhelminthes (table 3)However all other analyses placed Gnathostomulida assister to Syndermata with BS values of up to 91 eventhough overall BS remained low (fig 8 and table 3)Moreover the Bayesian analysis also recovered a sistergroup-relationship of Gnathostomulida and Syndermatawith strong support (PP = 098 fig 6) This position ofGnathostomulida as sister to Syndermata is consistent withthe Gnathifera hypothesis (Ahlrichs 1997 Herlyn and Ehlers1997) Monophyly of Gnathifera has also been found in pre-vious studies based on ribosomal protein data (Witek et al2009 Hausdorf et al 2010) and is also strongly supported bythe likely homology of gnathostomulidan jaws and rotiferantrophi (Rieger and Tyler 1995 Haszprunar 1996 Ahlrichs1997 Herlyn and Ehlers 1997 Jenner 2004a) For a thoroughanalysis of the phylogenetic relations within Syndermata and

Spirometra

Biomphalaria

Alvinella

Schmidtea

Schistosoma

Lottia

Terebratalia

Symbion

Clonorchis

Lumbricus

Moniezia

Tubulanus

Philodina

Cephalothrix

Echinococcus

Rotaria

Tubifex

Brachionus

Dugesia

Macracanthorhynchus

Apis

Schistosoma

Stylochoplana

HelobdellaCapitella

Chaet

Flustra

Euprymna

Cristatella

Echinoderes

Echinococcus

Neomenia

Lingula

Paraplanocera

Daphnia

Mytilus

Macrostomum

Priapulus

Bugula

Pedicellina

Dugesia

Paratenuisentis

Adineta

Tubulipora

Pomphorhynchus

Macrodasys

Pedicellina

Aplysia

Opisthorchis

Megadasys

Fasciola

Taenia

Nematoplana

Lecane

Echinorhynchus

Crassostrea

Turbanella

Alcyonidium

Seison

Idiosepius

Brachionus

02

86

92

93

99

62

84

96

86

97

89

93

9398

81

62

72

72

98

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha Syndermata

Platyhelm

inthes

FIG 4 ML tree obtained by analysis of data set d02 with 61 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and the four unstable taxa (Lepidodermella squamata Dactylopodola baltica and the two Gnathostomulidaspecies) were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1838

Struck et al doi101093molbevmsu143 MBE

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 2: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

Ecdysozoa the internal phylogeny of Spiralia is still unclear(Edgecombe et al 2011) Indeed spiralian animals exhibit awide variety and plasticity in development and morphologyincluding body organization (Nielsen 2012) which gave rise tothe distinction of two major taxa Lophotrochozoa andPlatyzoa (Halanych 2004 Edgecombe et al 2011) As men-tioned above Lophotrochozoa comprises at least annelids(ringed worms) lophophorates and mollusks (Halanych2004) and hence animals with a more complex morphologyIn contrast Platyzoa subsumes more simple appearing taxasuch as flatworms hairy backs (Gastrotricha) wheel animals(classical Rotifera) thorny-headed worms (Acanthocephala)and jaw worms (Gnathostomulida) (Cavalier-Smith 1998)Although some authors regard Platyzoa as sister toLophotrochozoa (Edgecombe et al 2011) others placePlatyzoa within Lophotrochozoa thus rendering Spiraliasynonymous with Lophotrochozoa (Halanych 2004)Importantly unique morphological autapomorphies sup-porting the monophyly of Platyzoa are lacking (Giribet2008) and phylogenetic analyses of nuclear and mitochondrialdata failed to resolve the question as well (Paps et al 2009a2009b Bernt et al 2013) Nevertheless there seems to be atendency for a weakly supported monophylum Platyzoa aslong as larger data sets were analyzed (Halanych 2004Hausdorf et al 2007 Struck and Fisse 2008 Hejnol et al2009 Paps et al 2009a Witek et al 2009) However acrossall these analyses placement of platyzoan taxa appearedunstable probably due to low data and taxa coverage(Edgecombe et al 2011) Moreover parallel evolution of char-acter states on long branches (also known as LBA) might alsohave confounded these analyses (Edgecombe et al 2011) Insummary monophyly of Platyzoa and the phylogenetic posi-tions of the platyzoan taxa within Spiralia are still contentiousalthough their positions have major implications for bilaterianevolution In particular monophyly of Platyzoa and a place-ment within Lophotrochozoa would be in line with thetheory of a more complex ancestry (Brinkman and Philippe2008) whereas paraphyletic Platyzoa with respect toLophotrochozoa would support the ldquoacoeloidndashplanuloidrdquohypothesis

Results and DiscussionTo address the major outstanding issues of bilaterian phylog-eny with respect to spiralian and more specifically platyzoanrelationships we applied a phylogenomic approach generat-ing transcriptome sequence data for 10 putative platyzoanand two nemertean species using second-generation se-quencing technology and a modified RNA amplificationmethod which allowed the generation of sequencing librariesfrom as few as 10 specimens of microscopic species ofGnathostomulida Gastrotricha and classical Rotifera (supple-mentary table S1 Supplementary Material online) These datawere complemented with transcriptomic or genomic data of53 other spiralian and ecdysozoan species including addi-tional representatives of Platyzoa (supplementary table S2Supplementary Material online) Hereby the taxon cover-age of Platyzoa increased 35-fold and for individual platyzoantaxa such as Syndermata (wheel animals and thorny-headed

worms) and Gastrotricha even 5-fold in comparison to pre-vious large-scale analyses of spiralian relationships (Dunnet al 2008 Hejnol et al 2009) After orthology assignment(Ebersberger et al 2009) the data were further screened forsequence redundancy (Kvist and Siddall 2013) potentiallyparalogous sequences (Struck 2013) and contamination(Struck 2013) resulting in a pruning of about 7 of sequencedata (supplementary tables S3ndashS8 Supplementary Materialonline)

Brute-Force Approach More Taxa and Data

Phylogenetic reconstructions based on the largest data setsd01 with 82162 amino acid positions and 383 sequencecoverage (supplementary table S9 Supplementary Materialonline) recovered monophyly of both Platyzoa andLophotrochozoa with strong bootstrap support (BS) of99 for both (fig 1) Within Platyzoa monophylyof Platyhelminthes of Syndermata as well as ofGnathostomulida was maximally supported whereas mono-phyly of Gastrotricha was not recovered The chaetonotidgastrotrich Lepidodermella squamata appeared as sister toPlatyhelminthes (BS 46) whereas the macrodasyidan gastro-trichs formed a monophylum (BS 74) as sister to all otherplatyzoan taxa (BS 68) Finally Gnathostomulida was sisterto Syndermata (BS 61) consistent with the Gnathiferahypothesis (Ahlrichs 1997 Herlyn and Ehlers 1997)

To study the influence of unstable taxa leaf stability anal-yses were performed With a leaf stability index of 0876 thegastrotrich Lep squamata was the most unstable specieswithin the sampled platyzoans followed by the two gnathos-tomulid species (0941) and the macrodasyidan gastrotrichDactylopodola baltica (0969) (fig 2 and supplementarytable S10 Supplementary Material online) Excluding thesefour platyzoan taxa from data set d01 and conducting anew phylogenetic reconstruction did not influence theremaining topology but led to an increased BS value of99 for a clade uniting Platyhelminthes and Syndermata anddecreased values for the monophyly of both Platyzoa andLophotrochozoa (BS 82 and 86 table 1) Thus the four un-stable taxa showed some influence on support for the phy-logenetic placement of other platyzoan taxa Therefore weexcluded these four taxa from the following analyses whichaddressed the potential role of LBA on platyzoan phylogenyin more detail

LBA Accounts for Monophyly of Platyzoa

Monophyletic Platyzoa as sister to Lophotrochozoa gainedstrong support in the analyses described above Howeverthorough inspection of the topology (fig 1) revealed consid-erable branch length heterogeneity with long branches in theanalyzed platyzoan lineages and rather short branches inlophotrochozoan and ecdysozoan lineages Hence the ob-served strong support for monophyletic Platyzoa might orig-inate from artificial rather than phylogenetic signal (Bergsten2005 Edgecombe et al 2011 Kuck et al 2012) For the treederived from data set d01 and shown in figure 1 the LB scoresshowed a bimodal distribution with a minimum between the

1834

Struck et al doi101093molbevmsu143 MBE

two highest optima at an LB score value of 0 (fig 2B) Putativeplatyzoan species had generally higher LB score values thanlophotrochozoan and ecdysozoan species (fig 2 and supple-mentary table S11 Supplementary Material online) Only theLB scores inferred for Stylochoplana and Paraplanocera withinPlatyhelminthes the two Brachionus species and Lecane inSyndermata and Megadasys and Macrodasys in Gastrotrichaapproximated those of most lophotrochozoans and ecdy-sozoans On the other hand Symbion (Cycliophora)Alcyonidium and Tubulipora (Ectoprocta) showed valuesgt0 resembling those of most of the platyzoan species sam-pled (fig 2)

To assess the effect of long branches on tree reconstruc-tion all species with LB scores above 0 were excluded fromdata set d01 (82162 positions) Interestingly monophyly ofPlatyzoa was no longer recovered (fig 3) Gastrotricha nowemerged as sister to Platyhelminthes (BS 96 table 1) and this

clade was sister to monophyletic Lophotrochozoa (BS 95table 1) whereas Syndermata was sister to all other spiraliantaxa Thus exclusion of long-branched species had a tremen-dous effect on the analyses rendering a strongly supportedmonophyly of Platyzoa with BS values gt95 into aparaphyletic assemblage in which the clade consistingof Gastrotricha + Platyhelminthes and Lophotrochozoaobtained strong support with a BS value of 95

Biases Causing Monophyletic Platyzoa

To gain further insights into the issue of mono- versus para-phyletic Platyzoa we analyzed the data with respect to thedifferent properties of individual genes In detail we studiedthe effect of gene-specific proportions of hydrophobic aminoacids and missing data base composition and branch lengthheterogeneity and evolutionary rates on tree reconstructionA common procedure is to choose one of these properties as

02

Aplysia

Taenia

Echinococcus

Cephalothrix

Philodina

Capitella

Chaetopleura

Alvinella

PriapulusEchinoderes

Rotaria

Terebratalia

Macracanthorhynchus

Schmidtea

Opisthorchis

Lingula

Pedicellina

Symbion

Apis

StylochoplanaParaplanocera

Paratenuisentis

Megadasys

Spirometra

Echinorhynchus

Euprymna

Moniezia

Biomphalaria

Nematoplana

Brachionus

Daphnia

Fasciola

Echinococcus

Macrodasys

Lumbricus

Idiosepius

FlustraBugula

Tubifex

Seison

Clonorchis

Lecane

Lepidodermella

Tubulipora

Mytilus

Brachionus

Dactylopodola

Neomenia

Helobdella

Alcyonidium

Gnathostomula

Pomphorhynchus

Turbanella

Dugesia

Gnathostomula

Dugesia

Pedicellina

Macrostomum

Cristatella

Crassostrea

Tubulanus

Adineta

SchistosomaSchistosoma

Lottia

73

68

90

61

90

99

93

99

92

74

99

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Platyzoa

Gastrotricha

62

FIG 1 Maximum-likelihood (ML) tree obtained by analysis of data set d01 with 65 taxa and 82162 amino acid positions Only BS values50 are shownat the branches Maximal support of 100 Higher taxonomic units are indicated

1835

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the most influential one either based a priori on literature or aposteriori on the obtained results (eg Brinkman and Philippe2008 Simmons 2012b Nesnidal et al 2013 Nosenko et al2013 Roure et al 2013 Salichos and Rokas 2013) Herein weused another procedure based on the variability exhibited inthe data itself prior to analyses of alternative data sets reflect-ing different degrees of data reduction According to the prin-cipal component analysis (Alexe et al 2008) the first principalcomponent explained 310 of the variance between thedifferent genes It was mainly derived from the proportionof missing data and base composition heterogeneity witheigenvectors pointing into opposite directions (supplemen-tary fig S3 and table S12 Supplementary Material online)Branch length heterogeneity and evolutionary rate were thelargest factors in the second component which explained268 of the variance Correlation analyses showed that inour case evolutionary rate which is often used as a proxy for

branch length heterogeneity (Brinkman and Philippe 2008)did not correlate with actual measurements of branch lengthheterogeneity (R2 = 00324 and 00635 supplementary fig S4Supplementary Material online)

As we wanted to test for LBA we used the direct measure-ment of branch length heterogeneity instead of evolutionaryrate Thus we generated data sets with either differentdegrees of missing data (d02ndashd06) proportion of low basecomposition heterogeneity (d07) or low branch lengthheterogeneity (d08) as well as genes being part of the 70or 95 confidence intervals of the first two principal compo-nents (d09 and d10) (supplementary tables S9 and S13Supplementary Material online) Based on the results of theprincipal component analysis we present in detail the resultsof three data sets d07 (low base composition heterogeneity)d08 (low branch length heterogeneity) and d02 The lattercombines a low degree of missing data with a high number ofpositions

Analyses of these three data sets excluding the four above-mentioned unstable platyzoan taxa consistently resulted inparaphyletic Platyzoa (fig 4 and table 1) as observed beforewhen excluding long-branched taxa from the large data setd01 Once more Platyhelminthes was sister to Gastrotricha(BS 76 84 and 71 Rouphozoa in table 1) and Lophotrochozoawas recovered as a monophyletic group (BS 98 47 and 95)The clade of GastrotrichaPlatyhelminthes was sister toLophotrochozoa (BS 72 86 and 75 table 1) andSyndermata was sister to the all other spiralian taxa againThus either by increasing the coverage (d02) or decreasingbase composition or branch length heterogeneity (d07and d08) paraphyletic Platyzoa was recovered (table 1) asa clade comprising Gastrotricha Platyhelminthes andLophotrochozoa gained strong branch support exceedingvalues of 70

Additional exclusion of long-branched species (figs 2and 3) reproduced paraphyly of Platyzoa in all analyseseven with maximum BS in some analyses AgainPlatyhelminthes was sister to Gastrotricha (BS 93 97 and54 fig 5 Rouphozoa in table 1) and both were more closelyrelated to the lophotrochozoan taxa than to Syndermata(BS 100 100 and 50 fig 5 Platyzoa Para in table 1)

Table 1 Bootstrap Support (BS) for Monophyly and Paraphyly of Platyzoa as well as Monophyly of Rouphozoa

Data Set Excl Taxa Pos Taxa Platyzoa Rouphozoa

Mono Para Mono

d01 (all data) None 82162 65 99a 0 3Unstable 82162 61 82b 1 1LB 82162 34 3 95a 96a

d02 (high coverage) Unstable 36513 61 3 86b 84b

LB 36513 34 0 100a 93b

d07 (low base frequency heterogeneity) Unstable 37907 61 19 75b 71b

LB 37907 34 0 100a 97a

d08 (low branch length heterogeneity) Unstable 29133 61 18 72b 76b

LB 29133 34 10 50 54

Excl excluded pos number of positions taxa number of taxa LB long-branched taxaaSupport values are part of the 95 confidence setbSupport values are part of the 70 confidence set

Nemert

ea

09

10

-30 0 60

Lea

f st

abili

ty in

dex

LB score

Ecdysozoa

Mollusca

Brachiopoda

PlatyhelminthesStPa

Br Le

SyndermataGastrotricha

Ectopr

octa

EntoproctaCycliophora

Gnathostomulida

Da

Lep

AnnelidaMe Ma

Sy

Tu Al

005

015

025

-25 0 50

freq

uenc

y

LB score B

A

FIG 2 Leaf stability indices and LB scores based on the ML analysis ofdata set d01 with 65 taxa and 82162 amino acid positions (A) Plot ofleaf stability indices against LB scores (B) Distribution of LB scores Thedashed line in A indicates LB score = 0 Shades distinguish ecdysozoanand lophotrochozoan species (light gray) from putative platyzoan spe-cies (dark gray) Some species are also labeled Pa Paraplanocera StStylochoplana Br Brachionus Le Lecane Me Megadasys MaMacrodasys Da Dactylopodola Lep Lepidodermella Sy Symbion TuTubulipora Al Alcyonidium

1836

Struck et al doi101093molbevmsu143 MBE

Moreover comparing the trees without long-branched spe-cies (figs 3 and 5) to the one with all species (fig 1) shows thatnow similar branch lengths lead to the ldquoplatyzoanrdquo and lopho-trochozoan species (figs 3 and 5) Additionally the standarddeviation of the species-specific LB scores for the trees shownin figures 3 and 5 are 103 and 93 respectively andhence lower than the standard deviation of 154 for thetree of figure 1 This means that the latter exhibits muchstronger branch length heterogeneity across all taxa thanthe former two Similarly the standard deviations for theclassical tip-to-root distances are lower for the trees offigures 3 and 5 with 0054 and 0143 than for the tree offigure 1 with 0202

We also used a Bayesian approach with the GTR + CATmodel as this is known to be more robust toward LBA thanclassical ML models such as LG (Lartillot et al 2007) Due tocomputational time restrictions and high memory require-ments we were not able to use the large data set d01 (82162positions) Instead we chose data set d02 (low to medium-low degree of missing data 36513 positions 461 coveragesupplementary table S9 Supplementary Material online)as the principal component analysis indicated coverageas the most influential property in the firstcomponent Importantly the Bayesian approach did notrecover monophyletic Platyzoa but instead a cladeincluding Gastrotricha + Platyhelminthes and monophyleticLophotrochozoa (posterior probability [PP] = 100 fig 6) andagain Gnathostomulida + Syndermata was sister to thisclade (PP = 100 fig6)

Thus combining Bayesian and maximum-likelihood anal-yses with different data and taxa exclusion strategies couldnot recover monophyletic Platyzoa in contrast to analysesusing only large numbers of data (figs 1 3ndash6 and table 1)Considering all 10 data sets (ie d01ndashd10) BS for monophy-letic Platyzoa substantially increased with additional aminoacid positions (dark gray line in fig 7A) whereas support forparaphyly decreased (black line in fig 7A) In contrast supportfor monophyly of Lophotrochozoa was not strongly affectedby the number of positions analyzed (light gray line in fig 7A)It is a well-known phenomenon of LBA that it is positivelymisleading that is with increasing numbers of positions theartificial group is more robustly recovered (Felsenstein 1978Huelsenbeck 1997 Bergsten 2005) On the other hand ex-cluding long-branched species from analyses did not lead tosuch correlations In particular support for the monophyly ofPlatyzoa remained low irrespective of the number of align-ment positions (dark gray line in fig 7B)

Additionally we determined for each data set the numberof single-gene trees supporting monophyly or paraphyly ofPlatyzoa Across all data sets the percentage of single genessupporting platyzoan paraphyly ranged from 86 to 117and thus was higher than the percentage supporting mono-phyletic Platyzoa ranging from 05 to 32 (table 2)Interestingly decreasing the degree of missing data (ied01ndashd06) and hence increasing the number of taxa pergene the ratio of the percentage of trees supportingparaphyly relative to the percentage of trees supportingmonophyly strongly increased (black line in fig 7C)

02

Pedicellina

Neomenia

Terebratalia

Crassostrea

Bugula

Brachionus

Aplysia

Euprymna

Priapulus

Brachionus

MacrodasysMegadasys

Tubulanus

Cristatella

Stylochoplana

Capitella

Pedicellina

Lottia

Tubifex

Paraplanocera

Echinoderes

Flustra

Mytilus

Alvinella

Lingula

Biomphalaria

Apis

Chaetopleura

Helobdella

Cephalothrix

Idiosepius

Lumbricus

Daphnia

Lecane63

60

9973

96

95

76 73

86

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

FIG 3 ML tree obtained by analysis of data set d01 with 34 taxa and 82162 amino acid positions All taxa exceeding LB scores gt0 in tree of figure 1were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1837

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Directly addressing biases in the data such as base or branchlength heterogeneity did not have such an effect on the ratioIn the case of LBA only strategies as used herein which areable to attenuate its misleading effect by excluding eitherbiased data or species or by increasing taxon coverage pergene can reveal whether or not an assembly of long-branchedtaxa is artificially grouped together (Bergsten 2005) In con-clusion our analyses support platyzoan paraphyly whereasrecovery of monophyletic ldquoPlatyzoardquo is most probably dueto LBA

Position of Gnathostomulida

In addition to LBA the inference of a stable topology washampered by the inclusion of Gnathostomulida and the twogastrotrichs Lepidodermella and Dactylopodola In order toelucidate the phylogenetic position of Gnathostomulidawithin Spiralia we reincluded the two formerly excludedgnathostomulid species into different data sets Importantlytheir inclusion did not alter the topology with respect toplatyzoan paraphyly in any tree reconstruction (eg cf

figs 4 and 8) Analysis of data set d07 excluding unstabletaxa except Gnathostomulida (ie Lepidodermella andDactylopodola) and of data set d02 excluding all long-branched taxa recovered Gnathostomulida as part of aclade with Gastrotricha and Platyhelminthes (table 3)However all other analyses placed Gnathostomulida assister to Syndermata with BS values of up to 91 eventhough overall BS remained low (fig 8 and table 3)Moreover the Bayesian analysis also recovered a sistergroup-relationship of Gnathostomulida and Syndermatawith strong support (PP = 098 fig 6) This position ofGnathostomulida as sister to Syndermata is consistent withthe Gnathifera hypothesis (Ahlrichs 1997 Herlyn and Ehlers1997) Monophyly of Gnathifera has also been found in pre-vious studies based on ribosomal protein data (Witek et al2009 Hausdorf et al 2010) and is also strongly supported bythe likely homology of gnathostomulidan jaws and rotiferantrophi (Rieger and Tyler 1995 Haszprunar 1996 Ahlrichs1997 Herlyn and Ehlers 1997 Jenner 2004a) For a thoroughanalysis of the phylogenetic relations within Syndermata and

Spirometra

Biomphalaria

Alvinella

Schmidtea

Schistosoma

Lottia

Terebratalia

Symbion

Clonorchis

Lumbricus

Moniezia

Tubulanus

Philodina

Cephalothrix

Echinococcus

Rotaria

Tubifex

Brachionus

Dugesia

Macracanthorhynchus

Apis

Schistosoma

Stylochoplana

HelobdellaCapitella

Chaet

Flustra

Euprymna

Cristatella

Echinoderes

Echinococcus

Neomenia

Lingula

Paraplanocera

Daphnia

Mytilus

Macrostomum

Priapulus

Bugula

Pedicellina

Dugesia

Paratenuisentis

Adineta

Tubulipora

Pomphorhynchus

Macrodasys

Pedicellina

Aplysia

Opisthorchis

Megadasys

Fasciola

Taenia

Nematoplana

Lecane

Echinorhynchus

Crassostrea

Turbanella

Alcyonidium

Seison

Idiosepius

Brachionus

02

86

92

93

99

62

84

96

86

97

89

93

9398

81

62

72

72

98

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha Syndermata

Platyhelm

inthes

FIG 4 ML tree obtained by analysis of data set d02 with 61 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and the four unstable taxa (Lepidodermella squamata Dactylopodola baltica and the two Gnathostomulidaspecies) were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1838

Struck et al doi101093molbevmsu143 MBE

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 3: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

two highest optima at an LB score value of 0 (fig 2B) Putativeplatyzoan species had generally higher LB score values thanlophotrochozoan and ecdysozoan species (fig 2 and supple-mentary table S11 Supplementary Material online) Only theLB scores inferred for Stylochoplana and Paraplanocera withinPlatyhelminthes the two Brachionus species and Lecane inSyndermata and Megadasys and Macrodasys in Gastrotrichaapproximated those of most lophotrochozoans and ecdy-sozoans On the other hand Symbion (Cycliophora)Alcyonidium and Tubulipora (Ectoprocta) showed valuesgt0 resembling those of most of the platyzoan species sam-pled (fig 2)

To assess the effect of long branches on tree reconstruc-tion all species with LB scores above 0 were excluded fromdata set d01 (82162 positions) Interestingly monophyly ofPlatyzoa was no longer recovered (fig 3) Gastrotricha nowemerged as sister to Platyhelminthes (BS 96 table 1) and this

clade was sister to monophyletic Lophotrochozoa (BS 95table 1) whereas Syndermata was sister to all other spiraliantaxa Thus exclusion of long-branched species had a tremen-dous effect on the analyses rendering a strongly supportedmonophyly of Platyzoa with BS values gt95 into aparaphyletic assemblage in which the clade consistingof Gastrotricha + Platyhelminthes and Lophotrochozoaobtained strong support with a BS value of 95

Biases Causing Monophyletic Platyzoa

To gain further insights into the issue of mono- versus para-phyletic Platyzoa we analyzed the data with respect to thedifferent properties of individual genes In detail we studiedthe effect of gene-specific proportions of hydrophobic aminoacids and missing data base composition and branch lengthheterogeneity and evolutionary rates on tree reconstructionA common procedure is to choose one of these properties as

02

Aplysia

Taenia

Echinococcus

Cephalothrix

Philodina

Capitella

Chaetopleura

Alvinella

PriapulusEchinoderes

Rotaria

Terebratalia

Macracanthorhynchus

Schmidtea

Opisthorchis

Lingula

Pedicellina

Symbion

Apis

StylochoplanaParaplanocera

Paratenuisentis

Megadasys

Spirometra

Echinorhynchus

Euprymna

Moniezia

Biomphalaria

Nematoplana

Brachionus

Daphnia

Fasciola

Echinococcus

Macrodasys

Lumbricus

Idiosepius

FlustraBugula

Tubifex

Seison

Clonorchis

Lecane

Lepidodermella

Tubulipora

Mytilus

Brachionus

Dactylopodola

Neomenia

Helobdella

Alcyonidium

Gnathostomula

Pomphorhynchus

Turbanella

Dugesia

Gnathostomula

Dugesia

Pedicellina

Macrostomum

Cristatella

Crassostrea

Tubulanus

Adineta

SchistosomaSchistosoma

Lottia

73

68

90

61

90

99

93

99

92

74

99

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Platyzoa

Gastrotricha

62

FIG 1 Maximum-likelihood (ML) tree obtained by analysis of data set d01 with 65 taxa and 82162 amino acid positions Only BS values50 are shownat the branches Maximal support of 100 Higher taxonomic units are indicated

1835

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the most influential one either based a priori on literature or aposteriori on the obtained results (eg Brinkman and Philippe2008 Simmons 2012b Nesnidal et al 2013 Nosenko et al2013 Roure et al 2013 Salichos and Rokas 2013) Herein weused another procedure based on the variability exhibited inthe data itself prior to analyses of alternative data sets reflect-ing different degrees of data reduction According to the prin-cipal component analysis (Alexe et al 2008) the first principalcomponent explained 310 of the variance between thedifferent genes It was mainly derived from the proportionof missing data and base composition heterogeneity witheigenvectors pointing into opposite directions (supplemen-tary fig S3 and table S12 Supplementary Material online)Branch length heterogeneity and evolutionary rate were thelargest factors in the second component which explained268 of the variance Correlation analyses showed that inour case evolutionary rate which is often used as a proxy for

branch length heterogeneity (Brinkman and Philippe 2008)did not correlate with actual measurements of branch lengthheterogeneity (R2 = 00324 and 00635 supplementary fig S4Supplementary Material online)

As we wanted to test for LBA we used the direct measure-ment of branch length heterogeneity instead of evolutionaryrate Thus we generated data sets with either differentdegrees of missing data (d02ndashd06) proportion of low basecomposition heterogeneity (d07) or low branch lengthheterogeneity (d08) as well as genes being part of the 70or 95 confidence intervals of the first two principal compo-nents (d09 and d10) (supplementary tables S9 and S13Supplementary Material online) Based on the results of theprincipal component analysis we present in detail the resultsof three data sets d07 (low base composition heterogeneity)d08 (low branch length heterogeneity) and d02 The lattercombines a low degree of missing data with a high number ofpositions

Analyses of these three data sets excluding the four above-mentioned unstable platyzoan taxa consistently resulted inparaphyletic Platyzoa (fig 4 and table 1) as observed beforewhen excluding long-branched taxa from the large data setd01 Once more Platyhelminthes was sister to Gastrotricha(BS 76 84 and 71 Rouphozoa in table 1) and Lophotrochozoawas recovered as a monophyletic group (BS 98 47 and 95)The clade of GastrotrichaPlatyhelminthes was sister toLophotrochozoa (BS 72 86 and 75 table 1) andSyndermata was sister to the all other spiralian taxa againThus either by increasing the coverage (d02) or decreasingbase composition or branch length heterogeneity (d07and d08) paraphyletic Platyzoa was recovered (table 1) asa clade comprising Gastrotricha Platyhelminthes andLophotrochozoa gained strong branch support exceedingvalues of 70

Additional exclusion of long-branched species (figs 2and 3) reproduced paraphyly of Platyzoa in all analyseseven with maximum BS in some analyses AgainPlatyhelminthes was sister to Gastrotricha (BS 93 97 and54 fig 5 Rouphozoa in table 1) and both were more closelyrelated to the lophotrochozoan taxa than to Syndermata(BS 100 100 and 50 fig 5 Platyzoa Para in table 1)

Table 1 Bootstrap Support (BS) for Monophyly and Paraphyly of Platyzoa as well as Monophyly of Rouphozoa

Data Set Excl Taxa Pos Taxa Platyzoa Rouphozoa

Mono Para Mono

d01 (all data) None 82162 65 99a 0 3Unstable 82162 61 82b 1 1LB 82162 34 3 95a 96a

d02 (high coverage) Unstable 36513 61 3 86b 84b

LB 36513 34 0 100a 93b

d07 (low base frequency heterogeneity) Unstable 37907 61 19 75b 71b

LB 37907 34 0 100a 97a

d08 (low branch length heterogeneity) Unstable 29133 61 18 72b 76b

LB 29133 34 10 50 54

Excl excluded pos number of positions taxa number of taxa LB long-branched taxaaSupport values are part of the 95 confidence setbSupport values are part of the 70 confidence set

Nemert

ea

09

10

-30 0 60

Lea

f st

abili

ty in

dex

LB score

Ecdysozoa

Mollusca

Brachiopoda

PlatyhelminthesStPa

Br Le

SyndermataGastrotricha

Ectopr

octa

EntoproctaCycliophora

Gnathostomulida

Da

Lep

AnnelidaMe Ma

Sy

Tu Al

005

015

025

-25 0 50

freq

uenc

y

LB score B

A

FIG 2 Leaf stability indices and LB scores based on the ML analysis ofdata set d01 with 65 taxa and 82162 amino acid positions (A) Plot ofleaf stability indices against LB scores (B) Distribution of LB scores Thedashed line in A indicates LB score = 0 Shades distinguish ecdysozoanand lophotrochozoan species (light gray) from putative platyzoan spe-cies (dark gray) Some species are also labeled Pa Paraplanocera StStylochoplana Br Brachionus Le Lecane Me Megadasys MaMacrodasys Da Dactylopodola Lep Lepidodermella Sy Symbion TuTubulipora Al Alcyonidium

1836

Struck et al doi101093molbevmsu143 MBE

Moreover comparing the trees without long-branched spe-cies (figs 3 and 5) to the one with all species (fig 1) shows thatnow similar branch lengths lead to the ldquoplatyzoanrdquo and lopho-trochozoan species (figs 3 and 5) Additionally the standarddeviation of the species-specific LB scores for the trees shownin figures 3 and 5 are 103 and 93 respectively andhence lower than the standard deviation of 154 for thetree of figure 1 This means that the latter exhibits muchstronger branch length heterogeneity across all taxa thanthe former two Similarly the standard deviations for theclassical tip-to-root distances are lower for the trees offigures 3 and 5 with 0054 and 0143 than for the tree offigure 1 with 0202

We also used a Bayesian approach with the GTR + CATmodel as this is known to be more robust toward LBA thanclassical ML models such as LG (Lartillot et al 2007) Due tocomputational time restrictions and high memory require-ments we were not able to use the large data set d01 (82162positions) Instead we chose data set d02 (low to medium-low degree of missing data 36513 positions 461 coveragesupplementary table S9 Supplementary Material online)as the principal component analysis indicated coverageas the most influential property in the firstcomponent Importantly the Bayesian approach did notrecover monophyletic Platyzoa but instead a cladeincluding Gastrotricha + Platyhelminthes and monophyleticLophotrochozoa (posterior probability [PP] = 100 fig 6) andagain Gnathostomulida + Syndermata was sister to thisclade (PP = 100 fig6)

Thus combining Bayesian and maximum-likelihood anal-yses with different data and taxa exclusion strategies couldnot recover monophyletic Platyzoa in contrast to analysesusing only large numbers of data (figs 1 3ndash6 and table 1)Considering all 10 data sets (ie d01ndashd10) BS for monophy-letic Platyzoa substantially increased with additional aminoacid positions (dark gray line in fig 7A) whereas support forparaphyly decreased (black line in fig 7A) In contrast supportfor monophyly of Lophotrochozoa was not strongly affectedby the number of positions analyzed (light gray line in fig 7A)It is a well-known phenomenon of LBA that it is positivelymisleading that is with increasing numbers of positions theartificial group is more robustly recovered (Felsenstein 1978Huelsenbeck 1997 Bergsten 2005) On the other hand ex-cluding long-branched species from analyses did not lead tosuch correlations In particular support for the monophyly ofPlatyzoa remained low irrespective of the number of align-ment positions (dark gray line in fig 7B)

Additionally we determined for each data set the numberof single-gene trees supporting monophyly or paraphyly ofPlatyzoa Across all data sets the percentage of single genessupporting platyzoan paraphyly ranged from 86 to 117and thus was higher than the percentage supporting mono-phyletic Platyzoa ranging from 05 to 32 (table 2)Interestingly decreasing the degree of missing data (ied01ndashd06) and hence increasing the number of taxa pergene the ratio of the percentage of trees supportingparaphyly relative to the percentage of trees supportingmonophyly strongly increased (black line in fig 7C)

02

Pedicellina

Neomenia

Terebratalia

Crassostrea

Bugula

Brachionus

Aplysia

Euprymna

Priapulus

Brachionus

MacrodasysMegadasys

Tubulanus

Cristatella

Stylochoplana

Capitella

Pedicellina

Lottia

Tubifex

Paraplanocera

Echinoderes

Flustra

Mytilus

Alvinella

Lingula

Biomphalaria

Apis

Chaetopleura

Helobdella

Cephalothrix

Idiosepius

Lumbricus

Daphnia

Lecane63

60

9973

96

95

76 73

86

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

FIG 3 ML tree obtained by analysis of data set d01 with 34 taxa and 82162 amino acid positions All taxa exceeding LB scores gt0 in tree of figure 1were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1837

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Directly addressing biases in the data such as base or branchlength heterogeneity did not have such an effect on the ratioIn the case of LBA only strategies as used herein which areable to attenuate its misleading effect by excluding eitherbiased data or species or by increasing taxon coverage pergene can reveal whether or not an assembly of long-branchedtaxa is artificially grouped together (Bergsten 2005) In con-clusion our analyses support platyzoan paraphyly whereasrecovery of monophyletic ldquoPlatyzoardquo is most probably dueto LBA

Position of Gnathostomulida

In addition to LBA the inference of a stable topology washampered by the inclusion of Gnathostomulida and the twogastrotrichs Lepidodermella and Dactylopodola In order toelucidate the phylogenetic position of Gnathostomulidawithin Spiralia we reincluded the two formerly excludedgnathostomulid species into different data sets Importantlytheir inclusion did not alter the topology with respect toplatyzoan paraphyly in any tree reconstruction (eg cf

figs 4 and 8) Analysis of data set d07 excluding unstabletaxa except Gnathostomulida (ie Lepidodermella andDactylopodola) and of data set d02 excluding all long-branched taxa recovered Gnathostomulida as part of aclade with Gastrotricha and Platyhelminthes (table 3)However all other analyses placed Gnathostomulida assister to Syndermata with BS values of up to 91 eventhough overall BS remained low (fig 8 and table 3)Moreover the Bayesian analysis also recovered a sistergroup-relationship of Gnathostomulida and Syndermatawith strong support (PP = 098 fig 6) This position ofGnathostomulida as sister to Syndermata is consistent withthe Gnathifera hypothesis (Ahlrichs 1997 Herlyn and Ehlers1997) Monophyly of Gnathifera has also been found in pre-vious studies based on ribosomal protein data (Witek et al2009 Hausdorf et al 2010) and is also strongly supported bythe likely homology of gnathostomulidan jaws and rotiferantrophi (Rieger and Tyler 1995 Haszprunar 1996 Ahlrichs1997 Herlyn and Ehlers 1997 Jenner 2004a) For a thoroughanalysis of the phylogenetic relations within Syndermata and

Spirometra

Biomphalaria

Alvinella

Schmidtea

Schistosoma

Lottia

Terebratalia

Symbion

Clonorchis

Lumbricus

Moniezia

Tubulanus

Philodina

Cephalothrix

Echinococcus

Rotaria

Tubifex

Brachionus

Dugesia

Macracanthorhynchus

Apis

Schistosoma

Stylochoplana

HelobdellaCapitella

Chaet

Flustra

Euprymna

Cristatella

Echinoderes

Echinococcus

Neomenia

Lingula

Paraplanocera

Daphnia

Mytilus

Macrostomum

Priapulus

Bugula

Pedicellina

Dugesia

Paratenuisentis

Adineta

Tubulipora

Pomphorhynchus

Macrodasys

Pedicellina

Aplysia

Opisthorchis

Megadasys

Fasciola

Taenia

Nematoplana

Lecane

Echinorhynchus

Crassostrea

Turbanella

Alcyonidium

Seison

Idiosepius

Brachionus

02

86

92

93

99

62

84

96

86

97

89

93

9398

81

62

72

72

98

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha Syndermata

Platyhelm

inthes

FIG 4 ML tree obtained by analysis of data set d02 with 61 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and the four unstable taxa (Lepidodermella squamata Dactylopodola baltica and the two Gnathostomulidaspecies) were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1838

Struck et al doi101093molbevmsu143 MBE

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 4: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

the most influential one either based a priori on literature or aposteriori on the obtained results (eg Brinkman and Philippe2008 Simmons 2012b Nesnidal et al 2013 Nosenko et al2013 Roure et al 2013 Salichos and Rokas 2013) Herein weused another procedure based on the variability exhibited inthe data itself prior to analyses of alternative data sets reflect-ing different degrees of data reduction According to the prin-cipal component analysis (Alexe et al 2008) the first principalcomponent explained 310 of the variance between thedifferent genes It was mainly derived from the proportionof missing data and base composition heterogeneity witheigenvectors pointing into opposite directions (supplemen-tary fig S3 and table S12 Supplementary Material online)Branch length heterogeneity and evolutionary rate were thelargest factors in the second component which explained268 of the variance Correlation analyses showed that inour case evolutionary rate which is often used as a proxy for

branch length heterogeneity (Brinkman and Philippe 2008)did not correlate with actual measurements of branch lengthheterogeneity (R2 = 00324 and 00635 supplementary fig S4Supplementary Material online)

As we wanted to test for LBA we used the direct measure-ment of branch length heterogeneity instead of evolutionaryrate Thus we generated data sets with either differentdegrees of missing data (d02ndashd06) proportion of low basecomposition heterogeneity (d07) or low branch lengthheterogeneity (d08) as well as genes being part of the 70or 95 confidence intervals of the first two principal compo-nents (d09 and d10) (supplementary tables S9 and S13Supplementary Material online) Based on the results of theprincipal component analysis we present in detail the resultsof three data sets d07 (low base composition heterogeneity)d08 (low branch length heterogeneity) and d02 The lattercombines a low degree of missing data with a high number ofpositions

Analyses of these three data sets excluding the four above-mentioned unstable platyzoan taxa consistently resulted inparaphyletic Platyzoa (fig 4 and table 1) as observed beforewhen excluding long-branched taxa from the large data setd01 Once more Platyhelminthes was sister to Gastrotricha(BS 76 84 and 71 Rouphozoa in table 1) and Lophotrochozoawas recovered as a monophyletic group (BS 98 47 and 95)The clade of GastrotrichaPlatyhelminthes was sister toLophotrochozoa (BS 72 86 and 75 table 1) andSyndermata was sister to the all other spiralian taxa againThus either by increasing the coverage (d02) or decreasingbase composition or branch length heterogeneity (d07and d08) paraphyletic Platyzoa was recovered (table 1) asa clade comprising Gastrotricha Platyhelminthes andLophotrochozoa gained strong branch support exceedingvalues of 70

Additional exclusion of long-branched species (figs 2and 3) reproduced paraphyly of Platyzoa in all analyseseven with maximum BS in some analyses AgainPlatyhelminthes was sister to Gastrotricha (BS 93 97 and54 fig 5 Rouphozoa in table 1) and both were more closelyrelated to the lophotrochozoan taxa than to Syndermata(BS 100 100 and 50 fig 5 Platyzoa Para in table 1)

Table 1 Bootstrap Support (BS) for Monophyly and Paraphyly of Platyzoa as well as Monophyly of Rouphozoa

Data Set Excl Taxa Pos Taxa Platyzoa Rouphozoa

Mono Para Mono

d01 (all data) None 82162 65 99a 0 3Unstable 82162 61 82b 1 1LB 82162 34 3 95a 96a

d02 (high coverage) Unstable 36513 61 3 86b 84b

LB 36513 34 0 100a 93b

d07 (low base frequency heterogeneity) Unstable 37907 61 19 75b 71b

LB 37907 34 0 100a 97a

d08 (low branch length heterogeneity) Unstable 29133 61 18 72b 76b

LB 29133 34 10 50 54

Excl excluded pos number of positions taxa number of taxa LB long-branched taxaaSupport values are part of the 95 confidence setbSupport values are part of the 70 confidence set

Nemert

ea

09

10

-30 0 60

Lea

f st

abili

ty in

dex

LB score

Ecdysozoa

Mollusca

Brachiopoda

PlatyhelminthesStPa

Br Le

SyndermataGastrotricha

Ectopr

octa

EntoproctaCycliophora

Gnathostomulida

Da

Lep

AnnelidaMe Ma

Sy

Tu Al

005

015

025

-25 0 50

freq

uenc

y

LB score B

A

FIG 2 Leaf stability indices and LB scores based on the ML analysis ofdata set d01 with 65 taxa and 82162 amino acid positions (A) Plot ofleaf stability indices against LB scores (B) Distribution of LB scores Thedashed line in A indicates LB score = 0 Shades distinguish ecdysozoanand lophotrochozoan species (light gray) from putative platyzoan spe-cies (dark gray) Some species are also labeled Pa Paraplanocera StStylochoplana Br Brachionus Le Lecane Me Megadasys MaMacrodasys Da Dactylopodola Lep Lepidodermella Sy Symbion TuTubulipora Al Alcyonidium

1836

Struck et al doi101093molbevmsu143 MBE

Moreover comparing the trees without long-branched spe-cies (figs 3 and 5) to the one with all species (fig 1) shows thatnow similar branch lengths lead to the ldquoplatyzoanrdquo and lopho-trochozoan species (figs 3 and 5) Additionally the standarddeviation of the species-specific LB scores for the trees shownin figures 3 and 5 are 103 and 93 respectively andhence lower than the standard deviation of 154 for thetree of figure 1 This means that the latter exhibits muchstronger branch length heterogeneity across all taxa thanthe former two Similarly the standard deviations for theclassical tip-to-root distances are lower for the trees offigures 3 and 5 with 0054 and 0143 than for the tree offigure 1 with 0202

We also used a Bayesian approach with the GTR + CATmodel as this is known to be more robust toward LBA thanclassical ML models such as LG (Lartillot et al 2007) Due tocomputational time restrictions and high memory require-ments we were not able to use the large data set d01 (82162positions) Instead we chose data set d02 (low to medium-low degree of missing data 36513 positions 461 coveragesupplementary table S9 Supplementary Material online)as the principal component analysis indicated coverageas the most influential property in the firstcomponent Importantly the Bayesian approach did notrecover monophyletic Platyzoa but instead a cladeincluding Gastrotricha + Platyhelminthes and monophyleticLophotrochozoa (posterior probability [PP] = 100 fig 6) andagain Gnathostomulida + Syndermata was sister to thisclade (PP = 100 fig6)

Thus combining Bayesian and maximum-likelihood anal-yses with different data and taxa exclusion strategies couldnot recover monophyletic Platyzoa in contrast to analysesusing only large numbers of data (figs 1 3ndash6 and table 1)Considering all 10 data sets (ie d01ndashd10) BS for monophy-letic Platyzoa substantially increased with additional aminoacid positions (dark gray line in fig 7A) whereas support forparaphyly decreased (black line in fig 7A) In contrast supportfor monophyly of Lophotrochozoa was not strongly affectedby the number of positions analyzed (light gray line in fig 7A)It is a well-known phenomenon of LBA that it is positivelymisleading that is with increasing numbers of positions theartificial group is more robustly recovered (Felsenstein 1978Huelsenbeck 1997 Bergsten 2005) On the other hand ex-cluding long-branched species from analyses did not lead tosuch correlations In particular support for the monophyly ofPlatyzoa remained low irrespective of the number of align-ment positions (dark gray line in fig 7B)

Additionally we determined for each data set the numberof single-gene trees supporting monophyly or paraphyly ofPlatyzoa Across all data sets the percentage of single genessupporting platyzoan paraphyly ranged from 86 to 117and thus was higher than the percentage supporting mono-phyletic Platyzoa ranging from 05 to 32 (table 2)Interestingly decreasing the degree of missing data (ied01ndashd06) and hence increasing the number of taxa pergene the ratio of the percentage of trees supportingparaphyly relative to the percentage of trees supportingmonophyly strongly increased (black line in fig 7C)

02

Pedicellina

Neomenia

Terebratalia

Crassostrea

Bugula

Brachionus

Aplysia

Euprymna

Priapulus

Brachionus

MacrodasysMegadasys

Tubulanus

Cristatella

Stylochoplana

Capitella

Pedicellina

Lottia

Tubifex

Paraplanocera

Echinoderes

Flustra

Mytilus

Alvinella

Lingula

Biomphalaria

Apis

Chaetopleura

Helobdella

Cephalothrix

Idiosepius

Lumbricus

Daphnia

Lecane63

60

9973

96

95

76 73

86

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

FIG 3 ML tree obtained by analysis of data set d01 with 34 taxa and 82162 amino acid positions All taxa exceeding LB scores gt0 in tree of figure 1were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1837

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Directly addressing biases in the data such as base or branchlength heterogeneity did not have such an effect on the ratioIn the case of LBA only strategies as used herein which areable to attenuate its misleading effect by excluding eitherbiased data or species or by increasing taxon coverage pergene can reveal whether or not an assembly of long-branchedtaxa is artificially grouped together (Bergsten 2005) In con-clusion our analyses support platyzoan paraphyly whereasrecovery of monophyletic ldquoPlatyzoardquo is most probably dueto LBA

Position of Gnathostomulida

In addition to LBA the inference of a stable topology washampered by the inclusion of Gnathostomulida and the twogastrotrichs Lepidodermella and Dactylopodola In order toelucidate the phylogenetic position of Gnathostomulidawithin Spiralia we reincluded the two formerly excludedgnathostomulid species into different data sets Importantlytheir inclusion did not alter the topology with respect toplatyzoan paraphyly in any tree reconstruction (eg cf

figs 4 and 8) Analysis of data set d07 excluding unstabletaxa except Gnathostomulida (ie Lepidodermella andDactylopodola) and of data set d02 excluding all long-branched taxa recovered Gnathostomulida as part of aclade with Gastrotricha and Platyhelminthes (table 3)However all other analyses placed Gnathostomulida assister to Syndermata with BS values of up to 91 eventhough overall BS remained low (fig 8 and table 3)Moreover the Bayesian analysis also recovered a sistergroup-relationship of Gnathostomulida and Syndermatawith strong support (PP = 098 fig 6) This position ofGnathostomulida as sister to Syndermata is consistent withthe Gnathifera hypothesis (Ahlrichs 1997 Herlyn and Ehlers1997) Monophyly of Gnathifera has also been found in pre-vious studies based on ribosomal protein data (Witek et al2009 Hausdorf et al 2010) and is also strongly supported bythe likely homology of gnathostomulidan jaws and rotiferantrophi (Rieger and Tyler 1995 Haszprunar 1996 Ahlrichs1997 Herlyn and Ehlers 1997 Jenner 2004a) For a thoroughanalysis of the phylogenetic relations within Syndermata and

Spirometra

Biomphalaria

Alvinella

Schmidtea

Schistosoma

Lottia

Terebratalia

Symbion

Clonorchis

Lumbricus

Moniezia

Tubulanus

Philodina

Cephalothrix

Echinococcus

Rotaria

Tubifex

Brachionus

Dugesia

Macracanthorhynchus

Apis

Schistosoma

Stylochoplana

HelobdellaCapitella

Chaet

Flustra

Euprymna

Cristatella

Echinoderes

Echinococcus

Neomenia

Lingula

Paraplanocera

Daphnia

Mytilus

Macrostomum

Priapulus

Bugula

Pedicellina

Dugesia

Paratenuisentis

Adineta

Tubulipora

Pomphorhynchus

Macrodasys

Pedicellina

Aplysia

Opisthorchis

Megadasys

Fasciola

Taenia

Nematoplana

Lecane

Echinorhynchus

Crassostrea

Turbanella

Alcyonidium

Seison

Idiosepius

Brachionus

02

86

92

93

99

62

84

96

86

97

89

93

9398

81

62

72

72

98

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha Syndermata

Platyhelm

inthes

FIG 4 ML tree obtained by analysis of data set d02 with 61 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and the four unstable taxa (Lepidodermella squamata Dactylopodola baltica and the two Gnathostomulidaspecies) were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1838

Struck et al doi101093molbevmsu143 MBE

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 5: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

Moreover comparing the trees without long-branched spe-cies (figs 3 and 5) to the one with all species (fig 1) shows thatnow similar branch lengths lead to the ldquoplatyzoanrdquo and lopho-trochozoan species (figs 3 and 5) Additionally the standarddeviation of the species-specific LB scores for the trees shownin figures 3 and 5 are 103 and 93 respectively andhence lower than the standard deviation of 154 for thetree of figure 1 This means that the latter exhibits muchstronger branch length heterogeneity across all taxa thanthe former two Similarly the standard deviations for theclassical tip-to-root distances are lower for the trees offigures 3 and 5 with 0054 and 0143 than for the tree offigure 1 with 0202

We also used a Bayesian approach with the GTR + CATmodel as this is known to be more robust toward LBA thanclassical ML models such as LG (Lartillot et al 2007) Due tocomputational time restrictions and high memory require-ments we were not able to use the large data set d01 (82162positions) Instead we chose data set d02 (low to medium-low degree of missing data 36513 positions 461 coveragesupplementary table S9 Supplementary Material online)as the principal component analysis indicated coverageas the most influential property in the firstcomponent Importantly the Bayesian approach did notrecover monophyletic Platyzoa but instead a cladeincluding Gastrotricha + Platyhelminthes and monophyleticLophotrochozoa (posterior probability [PP] = 100 fig 6) andagain Gnathostomulida + Syndermata was sister to thisclade (PP = 100 fig6)

Thus combining Bayesian and maximum-likelihood anal-yses with different data and taxa exclusion strategies couldnot recover monophyletic Platyzoa in contrast to analysesusing only large numbers of data (figs 1 3ndash6 and table 1)Considering all 10 data sets (ie d01ndashd10) BS for monophy-letic Platyzoa substantially increased with additional aminoacid positions (dark gray line in fig 7A) whereas support forparaphyly decreased (black line in fig 7A) In contrast supportfor monophyly of Lophotrochozoa was not strongly affectedby the number of positions analyzed (light gray line in fig 7A)It is a well-known phenomenon of LBA that it is positivelymisleading that is with increasing numbers of positions theartificial group is more robustly recovered (Felsenstein 1978Huelsenbeck 1997 Bergsten 2005) On the other hand ex-cluding long-branched species from analyses did not lead tosuch correlations In particular support for the monophyly ofPlatyzoa remained low irrespective of the number of align-ment positions (dark gray line in fig 7B)

Additionally we determined for each data set the numberof single-gene trees supporting monophyly or paraphyly ofPlatyzoa Across all data sets the percentage of single genessupporting platyzoan paraphyly ranged from 86 to 117and thus was higher than the percentage supporting mono-phyletic Platyzoa ranging from 05 to 32 (table 2)Interestingly decreasing the degree of missing data (ied01ndashd06) and hence increasing the number of taxa pergene the ratio of the percentage of trees supportingparaphyly relative to the percentage of trees supportingmonophyly strongly increased (black line in fig 7C)

02

Pedicellina

Neomenia

Terebratalia

Crassostrea

Bugula

Brachionus

Aplysia

Euprymna

Priapulus

Brachionus

MacrodasysMegadasys

Tubulanus

Cristatella

Stylochoplana

Capitella

Pedicellina

Lottia

Tubifex

Paraplanocera

Echinoderes

Flustra

Mytilus

Alvinella

Lingula

Biomphalaria

Apis

Chaetopleura

Helobdella

Cephalothrix

Idiosepius

Lumbricus

Daphnia

Lecane63

60

9973

96

95

76 73

86

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

FIG 3 ML tree obtained by analysis of data set d01 with 34 taxa and 82162 amino acid positions All taxa exceeding LB scores gt0 in tree of figure 1were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1837

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Directly addressing biases in the data such as base or branchlength heterogeneity did not have such an effect on the ratioIn the case of LBA only strategies as used herein which areable to attenuate its misleading effect by excluding eitherbiased data or species or by increasing taxon coverage pergene can reveal whether or not an assembly of long-branchedtaxa is artificially grouped together (Bergsten 2005) In con-clusion our analyses support platyzoan paraphyly whereasrecovery of monophyletic ldquoPlatyzoardquo is most probably dueto LBA

Position of Gnathostomulida

In addition to LBA the inference of a stable topology washampered by the inclusion of Gnathostomulida and the twogastrotrichs Lepidodermella and Dactylopodola In order toelucidate the phylogenetic position of Gnathostomulidawithin Spiralia we reincluded the two formerly excludedgnathostomulid species into different data sets Importantlytheir inclusion did not alter the topology with respect toplatyzoan paraphyly in any tree reconstruction (eg cf

figs 4 and 8) Analysis of data set d07 excluding unstabletaxa except Gnathostomulida (ie Lepidodermella andDactylopodola) and of data set d02 excluding all long-branched taxa recovered Gnathostomulida as part of aclade with Gastrotricha and Platyhelminthes (table 3)However all other analyses placed Gnathostomulida assister to Syndermata with BS values of up to 91 eventhough overall BS remained low (fig 8 and table 3)Moreover the Bayesian analysis also recovered a sistergroup-relationship of Gnathostomulida and Syndermatawith strong support (PP = 098 fig 6) This position ofGnathostomulida as sister to Syndermata is consistent withthe Gnathifera hypothesis (Ahlrichs 1997 Herlyn and Ehlers1997) Monophyly of Gnathifera has also been found in pre-vious studies based on ribosomal protein data (Witek et al2009 Hausdorf et al 2010) and is also strongly supported bythe likely homology of gnathostomulidan jaws and rotiferantrophi (Rieger and Tyler 1995 Haszprunar 1996 Ahlrichs1997 Herlyn and Ehlers 1997 Jenner 2004a) For a thoroughanalysis of the phylogenetic relations within Syndermata and

Spirometra

Biomphalaria

Alvinella

Schmidtea

Schistosoma

Lottia

Terebratalia

Symbion

Clonorchis

Lumbricus

Moniezia

Tubulanus

Philodina

Cephalothrix

Echinococcus

Rotaria

Tubifex

Brachionus

Dugesia

Macracanthorhynchus

Apis

Schistosoma

Stylochoplana

HelobdellaCapitella

Chaet

Flustra

Euprymna

Cristatella

Echinoderes

Echinococcus

Neomenia

Lingula

Paraplanocera

Daphnia

Mytilus

Macrostomum

Priapulus

Bugula

Pedicellina

Dugesia

Paratenuisentis

Adineta

Tubulipora

Pomphorhynchus

Macrodasys

Pedicellina

Aplysia

Opisthorchis

Megadasys

Fasciola

Taenia

Nematoplana

Lecane

Echinorhynchus

Crassostrea

Turbanella

Alcyonidium

Seison

Idiosepius

Brachionus

02

86

92

93

99

62

84

96

86

97

89

93

9398

81

62

72

72

98

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha Syndermata

Platyhelm

inthes

FIG 4 ML tree obtained by analysis of data set d02 with 61 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and the four unstable taxa (Lepidodermella squamata Dactylopodola baltica and the two Gnathostomulidaspecies) were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1838

Struck et al doi101093molbevmsu143 MBE

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 6: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

Directly addressing biases in the data such as base or branchlength heterogeneity did not have such an effect on the ratioIn the case of LBA only strategies as used herein which areable to attenuate its misleading effect by excluding eitherbiased data or species or by increasing taxon coverage pergene can reveal whether or not an assembly of long-branchedtaxa is artificially grouped together (Bergsten 2005) In con-clusion our analyses support platyzoan paraphyly whereasrecovery of monophyletic ldquoPlatyzoardquo is most probably dueto LBA

Position of Gnathostomulida

In addition to LBA the inference of a stable topology washampered by the inclusion of Gnathostomulida and the twogastrotrichs Lepidodermella and Dactylopodola In order toelucidate the phylogenetic position of Gnathostomulidawithin Spiralia we reincluded the two formerly excludedgnathostomulid species into different data sets Importantlytheir inclusion did not alter the topology with respect toplatyzoan paraphyly in any tree reconstruction (eg cf

figs 4 and 8) Analysis of data set d07 excluding unstabletaxa except Gnathostomulida (ie Lepidodermella andDactylopodola) and of data set d02 excluding all long-branched taxa recovered Gnathostomulida as part of aclade with Gastrotricha and Platyhelminthes (table 3)However all other analyses placed Gnathostomulida assister to Syndermata with BS values of up to 91 eventhough overall BS remained low (fig 8 and table 3)Moreover the Bayesian analysis also recovered a sistergroup-relationship of Gnathostomulida and Syndermatawith strong support (PP = 098 fig 6) This position ofGnathostomulida as sister to Syndermata is consistent withthe Gnathifera hypothesis (Ahlrichs 1997 Herlyn and Ehlers1997) Monophyly of Gnathifera has also been found in pre-vious studies based on ribosomal protein data (Witek et al2009 Hausdorf et al 2010) and is also strongly supported bythe likely homology of gnathostomulidan jaws and rotiferantrophi (Rieger and Tyler 1995 Haszprunar 1996 Ahlrichs1997 Herlyn and Ehlers 1997 Jenner 2004a) For a thoroughanalysis of the phylogenetic relations within Syndermata and

Spirometra

Biomphalaria

Alvinella

Schmidtea

Schistosoma

Lottia

Terebratalia

Symbion

Clonorchis

Lumbricus

Moniezia

Tubulanus

Philodina

Cephalothrix

Echinococcus

Rotaria

Tubifex

Brachionus

Dugesia

Macracanthorhynchus

Apis

Schistosoma

Stylochoplana

HelobdellaCapitella

Chaet

Flustra

Euprymna

Cristatella

Echinoderes

Echinococcus

Neomenia

Lingula

Paraplanocera

Daphnia

Mytilus

Macrostomum

Priapulus

Bugula

Pedicellina

Dugesia

Paratenuisentis

Adineta

Tubulipora

Pomphorhynchus

Macrodasys

Pedicellina

Aplysia

Opisthorchis

Megadasys

Fasciola

Taenia

Nematoplana

Lecane

Echinorhynchus

Crassostrea

Turbanella

Alcyonidium

Seison

Idiosepius

Brachionus

02

86

92

93

99

62

84

96

86

97

89

93

9398

81

62

72

72

98

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha Syndermata

Platyhelm

inthes

FIG 4 ML tree obtained by analysis of data set d02 with 61 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and the four unstable taxa (Lepidodermella squamata Dactylopodola baltica and the two Gnathostomulidaspecies) were excluded Only BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1838

Struck et al doi101093molbevmsu143 MBE

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 7: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

the implication for their evolution we refer to a recent tran-scriptome-based study (Wey-Fabrizius et al 2014)

A Novel View on Spiralian Phylogeny

In summary our analyses support the monophyly ofLophotrochozoa and of a clade combining Gastrotricha andPlatyhelminthes Gnathifera is sister to a clade comprising theaforementioned taxa (fig 9) No morphological apomorphy isknown to date supporting either a monophyletic origin ofPlatyhelminthes and Gastrotricha or of PlatyhelminthesGastrotricha and Lophotrochozoa (Jenner 2004a Rothe andSchmidt-Rhaesa 2009) and hence could be used for namingthese two clades However whereas most of the other spir-alian taxa exhibit additional structures for food gathering intheir ground pattern (eg palps in annelids proboscis in ne-merteans filter feeding apparatuses in lophophorates ento-procts and cycliophorans as well as jaw-like elementsin rotifers gnathostomulids and mollusks) gastrotrichsand most flatworm species ingest food without such extra-structures just by dilating their rather simple pharynx Therespective pharynx simplex is part of the ground pattern ofPlatyhelminthes and enables the swallowing of prey by eithersucking action or engulfment (Doe 1981) Gastrotricha pos-sess a Y-shaped or inverted Y-shaped sucking pharynx(Kieneke et al 2008) Although gathering food by sucking isnot necessarily an autapomorphy of these two taxa this

common characteristic can nonetheless be utilized fornaming the clade We therefore suggest the nameRouphozoa (derived from the Greek word rouphao for ingest-ing by sucking) to define the last common ancestor ofPlatyhelminthes and Gastrotricha and all its descendantsThe clade of Rouphozoa + Lophotrochozoa can be namedPlatytrochozoa reflecting that it comprises Platyhelminthesand taxa with a trochophore larva and all extant descendantsof the last common ancestor of Platyhelminthes andLophotrochozoa Spiralia then comprises Gnathifera(Syndermata + Gnathostomulida) and Platytrochozoa

Implications for bilaterian evolutionThe paraphyly of Platyzoa with respect to Lophotrochozoa ismore in line with the traditional ldquoacoeloidndashplanuloidrdquo hy-pothesis than with the scenario of a last common ancestorof Deuterostomia Ecdysozoa and Spiralia with a segmentedand coelomate body organization resembling an annelidWithin Spiralia the non-coelomate small-sized taxa succes-sively branch off first (fig 9) Both Gnathostomulida andGastrotricha comprise small interstitial organisms with anacoelomate body organization and lt4 or 2 mm of lengthrespectively (Nielsen 2012) Within Syndermata only thehighly modified parasitic Acanthocephala are larger than afew millimeters and all exhibit a pseudocoelomate organiza-tion (Herlyn and Rohrig 2003 Nielsen 2012) Similarly inPlatyhelminthes the ancestral condition is also a small-sized

Terebratalia

IdiosepiusMytilus

Lecane

Alvinella

Crassostrea

Tubulanus

Cristatella

Biomphalaria

PedicellinaCapitella

Brachionus

Cephalothrix

Echinoderes

Euprymna

Tubifex

Paraplanocera

Lingula

Lottia

Daphnia

Flustra

Lumbricus

Bugula

Priapulus

Aplysia

Macrodasys

Helobdella

Megadasys

Pedicellina

Chaetopleura

Stylochoplana

Neomenia

Brachionus

Apis

02

52

76

63

8450

75

99

54

58

Lophotrochozoa

Ecdysozoa

BrachiopodaEntoprocta

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Brachiopoda

FIG 5 ML tree obtained by analysis of data set d08 with 34 taxa and 29133 amino acid positions Only partitions with low degrees of branch lengthheterogeneity were included and all taxa exceeding LB scores gt0 in tree of figure 1 were excluded Only BS 50 are shown at the branches Maximalsupport of 100 Higher taxonomic units are indicated

1839

Platyzoans and Spiralia doi101093molbevmsu143 MBE

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 8: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

acoelomate organization as seen today in Catenulida andMacrostomorpha which are lt5 mm in length (Nielsen2012) Within Spiralia animals with a coelomate body orga-nization are according to our analyses only found inLophotrochozoa (fig 9) Thus it is epistemologically moreparsimonious to assume that the last common ancestor ofSpiralia was an animal lacking a coelomic body cavityAlthough many relationships within Lophotrochozoa arestill unresolved in our study and warrant further investiga-tions our analyses suggest that within Spiralia coelomic cav-ities with a lining epithelium might have originated at theearliest in the stem lineage of Lophotrochozoa Additionallyrecent investigations of development and formation of coe-lomic cavities using a comparative anatomical approachrevealed considerable differences between Annelida andPanarthropoda already in the earliest steps of coelomogenesis

(for review see Koch et al 2014) Hence segmental coeloms inannelids and arthropods are not necessarily homologousstructures (Koch et al 2014) In addition the developmentalorigins of coelomic cavities in deuterostomes differ fromthose in lophotrochozoans and panarthropods (Nielsen2012) Considering these differences and our results it ismore probable that coelomic cavities evolved independentlywithin the major bilaterian clades Deuterostomia Ecdysozoaand Spiralia Clearly further analyses of the underlying geneticregulatory networks in coelom formation across a wide vari-ety of coelomate and non-coelomate taxa are necessary tosubstantiate or reject this conclusion

The position of coelomate Chaetognatha within Bilateria isalso of interest in this aspect but still enigmatic based on bothmolecular and morphological data Deuterostome as well asprotostome affinities including a sister group relationship to

Adineta

Opisthorchis

Lecane

Schistosoma

Cephalothrix

Philodina

Flustra

Capitella

Euprymna

Echinococcus

Pedicellina

Terebaratalia

Tubulanus

Fasciola

Lingula

Neomenia

Paratenuisentis

Idiosepius

Chaetopleura

Taenia

Pedicellina

Alvinella

Spirometra

Macracanthorhynchus

Helobdella

Macrodasys

Alcyonidium

Gnathostomula

Pomphorhynchus

Tubulipora

Echinorhynchus

Rotaria

Mytilus

Dugesia

Echinoderes

Brachionus

Daphnia

Crassostrea

Seison

Echinococcus

Cristatella

Stylochoplana

Macrostomum

Lottia

Symbion

Lumbricus

Dugesia

Gnathostomula

Schistosoma

Moniezia

Turbanella

Clonorchis

Priapulus

Brachionus

Aplysia

Apis

Bugula

Nematoplana

Paraplanocera

Tubifex

Biomphalaria

Megadasys

Schmidtea

02

086

094

099

059

097

098

099

Lophotrochozoa

Ecdysozoa

Gastrotricha

Gnathostomulida

Syndermata

Platyhelminthes

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

FIG 6 BI tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to low degreesof missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excluded OnlyPPs 050 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

1840

Struck et al doi101093molbevmsu143 MBE

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 9: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

Spiralia have been proposed (Marletaz et al 2006 Matus et al2006 Dunn et al 2008 Perez et al 2014) MoreoverChaetognatha possess a unique type of coelom formationheterocoely which exhibits no strong similarities to the other

types of coelom formation (Kapp 2000 Perez et al 2014) andhence might be indicative of a convergent evolution of coe-lomic cavities in Chaetognatha However ultrastructural stud-ies of coelom formation are lacking at the moment (Perezet al 2014)

The alternative scenario whereupon evolution progressedfrom complex to simple in Bilateria is mainly based on sim-ilarities in segmentation in vertebrates arthropods and an-nelids (De Robertis 2008 Couso 2009 Chesebro et al 2013)However in our analyses Annelida was always deeply nestedwithin Lophotrochozoa Thus similar to the evolution of coe-lomic cavities a segmented ancestry of Spiralia would implyseveral independent losses of this organization which weregard as less parsimonious Moreover Annelida andArthropoda exhibit high plasticity in segmentation and onthe other hand other spiralian and ecdysozoan taxa exhibitvarying degrees of repetitive organization in organsystems This includes Kinorhyncha Monoplacophora andPolyplacophora Eucestoda and other platyhelminths somenematodes and nematomorphs and a nemertean (Hannibaland Patel 2013 Struck 2012) In addition segmentation ismostly restricted to tissue derived from the ectoderm in ar-thropods from the mesoderm in vertebrates and from bothgerm layers in annelids (Nielsen 2012) A possible explanationfor similarities in segment formation including developmentalpathways like the notch oscillation could be that these generegulatory networks have been co-opted from ancestral net-works involved in the organization of repetitive organ systems(Davidson and Erwin 2006 Chipman 2010) However thishypothesis cannot be conclusively proven due to a currentlack of data on developmental gene pathways in taxa withsuch repetitive organ systems (Chesebro et al 2013)Nonetheless the spiralian phylogeny derived herein providesadditional support for the hypothesis that segmentationevolved independently within Deuterostomia Ecdysozoaand Spiralia

Support for a complex bilaterian ancestor also arose fromthe observation of neuronal structures called mushroombodies that were consistently present in arthropods andsome annelids as well as similar gene expression patternsnoted in these bodies and in the vertebrate pallium (Heueret al 2010 Tomer et al 2010) However within annelidsmushroom bodies occur exclusively in five families of thesubgroup Errantia which are all characterized by a high va-gility (Heuer et al 2010 Struck et al 2011) while they are notknown for any other annelid or spiralian taxa (Rothe andSchmidt-Rhaesa 2009 Heuer et al 2010 Nielsen 2012Loesel 2014) Thus if such distinct higher brain centers aretaken as an ancestral condition of a complex last commonspiralian ancestor (Heuer et al 2010) several losses withinSpiralia including even several ones within Annelida haveto be assumed On the other hand the gastrotrich nervoussystem consists of a brain with a solid arch-like dorsal com-missure with laterally positioned cell somata and a fine ven-tral commissure as well as a pair of longitudinal lateroventralnerve cords joining posteriorly (Rothe and Schmidt-Rhaesa2009) This organization is similar to the organization of thenervous system of Acoelomorpha Hence in comparison to

0

20

40

60

80

100

25 35 45 55 65 75 85

Boo

tstr

ap s

uppo

rt

Number of positions (1000)

A

0

20

40

60

80

100

Boo

tstr

ap s

uppo

rt

B

0

4

8

12

16

150 250 350 450 550

rati

o

Number of genes

C

FIG 7 BS and the ratio of single-genes supporting paraphyly overmonophyly relative to the number of alignment positions or genes(A and B) BS for monophyly and paraphyly of Platyzoa as well as mono-phyly of Lophotrochozoa relative to the number of positions(A) Analyses based on 61 taxa from which the four unstable taxa(Lepidodermella squamata Dactylopodola baltica and the twoGnathostomulida species) were excluded (B) Analyses based on 34taxa from which all taxa exceeding LB scores gt0 in tree of figure 1were excluded Light gray = monophyly of Lophotrochzoa darkgray = monophyly of Platyzoa black = paraphyly of Platyzoa Best-fittingtrend lines generated by Excel are also shown in the same colors(C) Ratio of the percentage of single-gene trees supporting paraphylyof Platyzoa to the percentage of single-gene trees supporting mono-phyly of Platyzoa relative to the number of genes Diamonds = data setsd02 and d03 with reduced missing data triangles = data sets d01 d04ndashd06 generated using MARE circle = data set d07 with reduced baseheterogeneity square = data set d08 with reduced branch length het-erogeneity crosses = data sets d09 and d10 based on confidence inter-vals of PCA The best-fitting trend line generated by Excel for the datasets d01ndashd06 with decreasing degrees of missing data is shown in black

1841

Platyzoans and Spiralia doi101093molbevmsu143 MBE

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 10: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

the net-like plexus without a cerebral ganglion in non-bilater-ian animals both Gastrotricha and Acoelomorpha express acertain degree of condensation at the anterior end to form amore or less condensed commissural brain but to a lesserdegree than other bilaterian taxa (Rothe and Schmidt-Rhaesa2009) Thus Gastrotricha might still exhibit the ancestral bila-terian condition indicative that also the last common

ancestor of Spiralia showed that characteristic Moreoveralso for example platyhelminths syndermatans gnathosto-mulids or entoprocts show anterior condensations of thecentral nervous system but not to the same degree as inelaborate brains which can be found in some mollusks orannelids (Northcutt 2012 Loesel 2014) Such a condensationis in general agreement with a small-sized noncoelomate

Pedicellina

Neomenia

Priapulus

Lingula

Macracanthorhynchus

Mytilus

Stylolochoplana

Gnathostomula

Dugesia

Megadasys

Opistorchis

Cristatella

Seison

Rotaria

Chaetopleura

Tubulipora

Terebaratalia

Euprymna

Pedicellina

Schmidtea

Paraplanocera

Philodina

Helobdella

Lottia

Moniezia

Echinorhynchus

Apis

Flustra

Gnathostomula

Clonorchis

Turbanella

CephalothrixTubulanus

Capitella

Aplysia

Echinococcus

Fasciola

Dugesia

Crassostrea

Symbion

Tubifex

Brachionus

Nematoplana

Macrodasys

Lumbricus

Taenia

Schistosoma

Macrostomum

Echinococcus

Idiosepius

Paratatenuisentis

Bugula

Echinoderes

Alvinella

Biomphalaria

Alcyonidium

Spirometra

Brachionus

Schistosoma

Adineta

Daphnia

Pomphorhynchus

Lecane

02

70

99

92

88

98

99

99

5689

61

62

85

93

71

66

99

99

99

66

89

98

94

Lophotrochozoa

Ecdysozoa

Brachiopoda

EntoproctaCycliophora

Ectoprocta

Nemertea

Annelida

Mollusca

Gastrotricha

Syndermata

Platyhelm

inthes

Gnathostomulida

FIG 8 ML tree obtained by analysis of data set d02 with 63 taxa and 36513 amino acid positions Only partitions with low to medium up to lowdegrees of missing data were included and only the two unstable gastrotrich taxa Lepidodermella squamata and Dactylopodola baltica were excludedOnly BS 50 are shown at the branches Maximal support of 100 Higher taxonomic units are indicated

Table 2 Percentage of Single-Genes Supporting Monophyly or Paraphyly of Platyzoa

Degree of Missing Data Heterogeneity PCA

Data Set d01 d02 d03 d04 d05 d06 d07 d08 d09 d10

Genes 559 232 413 340 235 174 217 187 446 537

Mono 21 09 12 15 09 06 32 32 20 20

Para 97 86 94 91 94 86 115 118 101 99

Lack 882 905 893 894 898 908 853 850 879 881

ParaMono 45 10 78 62 11 15 36 37 5 48

Genes number of genes in data set Mono percentage of single-gene trees supporting monophyly of Platyzoa Para percentage of single-gene trees supporting paraphylyof Platyzoa Lack percentage of single-gene trees lacking resolution regarding this question ParaMono ratio of the percentage of single-gene trees supporting paraphyly ofPlatyzoa to the percentage of single-gene trees supporting monophyly of Platyzoa PCA principal component analysis

1842

Struck et al doi101093molbevmsu143 MBE

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 11: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

ancestor for Spiralia showing no complex body organizationOn the other hand the observed similarities in the expressionprofiles of mushroom bodies of arthropods and annelids aswell as the vertebrate pallium support the view that the evo-lution of more complex brain centers occurred early on inBilateria (Heuer et al 2010 Tomer et al 2010) However allthree organs are part of the olfaction system Analyses ofthese expression profiles in the brains of other bilateriantaxa are lacking in the moment Hence instead of being in-dicative of elaborative morphological structures the observedsimilar expression profiles could be part of ancestral gene reg-ulatory networks involved in the integration of chemosensory

input in clusters of cells of more simply organized brainsHowever developmental biological studies of the olfactionsystem of other bilaterian taxa such as Gastrotricha orPlatyhelminthes are required to substantiate eitherhypothesis

In conclusion paraphyly of ldquoPlatyzoardquo with respect toLophotrochozoa and the spiralian phylogeny presentedherein provide support for the view that the last commonancestor of Spiralia was an organism without coelomic cavitysegmentation and elaborate brain structures which probablyinhabited the marine interstitial realm This implies that evo-lution in Bilateria progressed most likely from a simple ances-tor to more complex descendants independently within thethree major bilaterian clades However we cannot rule outthat miniaturization or a progenetic origin of the discussedtaxa lead to loss of their morphological complexity Severalsuch examples are known from annelids and arthropods as inthese cases it was more parsimonious to assume secondarysimplification than convergent evolution (Jenner 2004bBleidorn 2007) However the above discussion also showsthat besides a robust phylogeny of Spiralia and Bilateria de-velopmental biological studies of gene regulatory networksand expression profiles beyond the few standard model or-ganisms are necessary to understand the evolution of Spiralia

Material and Methods

Data Generation

Supplementary table S1 Supplementary Material online listsspecies (four gastrotrich two flatworms two wheel animalsone acanthocephalan one gnathostomulid as well as twonemertean species) collected for this study As deeply se-quenced transcriptome libraries were lacking for nemerteanswe additionally constructed them for representatives of thistaxon Upon collection samples were either snap-frozen at80 C or stored in RNAlater Total RNA was isolated usingthe NucleoSpin RNA XS Kit (Macherey-Nagel) for Rotariarotatoria and Lecane inermis (both Syndermata classicalRotifera) the peqGOLD MicroSpin Total RNA kit (peqlab)for Gnathostomula paradoxa (Gnathostomulida) Megadasyssp Macrodasys sp Dac baltica and Lep squamata(all Gastrotricha) or the peqGOLD Total RNA kit(peqlab) for Tubulanus polymorphus Cephalothrixlinearis (both Nemertea) Nematoplana coelogynoporoidesand Stylochoplana maculata (both Platyhelminthes)and Macracanthorhynchus hirudinaceus (SyndermataAcanthocephala)

For all species except the nemerteans total RNA was re-verse-transcribed to double-stranded cDNA with the MINTUNIVERSAL cDNA synthesis kit (Evrogen) to produce ampli-fied cDNA libraries For R rotatoria Gnathostomulida andGastrotricha a modified amplification protocol which in-cluded an in-vitro transcription step had been used Forthis protocol the cDNA synthesis was modified to contain1 mM T7-PlugOligo (50-C AATT GTAA TAC GAC TCA CTATAGG GAGAACGGGGG-30) comprising a T7 promotor se-quence instead of 1 mM PlugOligo-3 M in combination withCDS-3 M adapter for the first strand synthesis and 01 mM

p

a

a

Gastrotricha

Gnathostomulida

Syndermata Gnathifera

Spiralia

Platyhelminthes

Rouphozoa

Platytrochozoa

Mollusca Annelida Brachiopoda Phoronida Ectoprocta Nemertea ()

Lophotrochozoa

a

aEntoprocta Cycliophorap

c

FIG 9 Proposed phylogeny of Spiralia Higher taxonomic units andnames are given Drawings depict the acoelomate (=a) pseudocoelom-ate (=p) and coelomate (=c) body organization Picture of Rotarianeptunoida (Syndermata) was courtesy of Michael Plewka () meansthat it is still discussed if the lateral vessels of the nemertean circulatorysystem are homologous to coelomic cavities of other lophotrochozoantaxa (Turbeville 1986)

Table 3 BS for Monophyly of Gnathifera

Data Set Excl Taxa Taxa Gnathifera

d01 (all data) None 65 61Unstable 63 91a

LB 36 67

d02 (high coverage) Unstable 63 71a

LB 36 10b

d07 (low base frequency heterogeneity) Unstable 63 48b

LB 36 86a

d08 (low branch length heterogeneity) Unstable 63 24LB 36 12

Excl excluded (same as in table 1 except for Gnathostomulida) Taxa number oftaxa LB long-branched taxaaSupport values are part of the 70 confidence setbGnathostomulida not placed as sister to Syndermata in the ML tree but in a cladewith Gastrotricha and Platyhelminthes

1843

Platyzoans and Spiralia doi101093molbevmsu143 MBE

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 12: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

T7-primer (50-AATT GTAA TAC GAC TCA CTA TAGG-30)plus 01 mM M1-primer instead of 02 mM M1-primer for thesecond strand synthesis Amplified cDNA was purified usingthe peqGOLD Cycle-Pure Kit (peqlab) digested with SfiI andsize-fractioned using CHROMA SPIN-1000 (Clontech)Purified cDNA was vacuum-concentrated to 155ml and13ml was used for the generation of mRNA by in vitro tran-scription (over night 37 C) employing T7 RNA polymerase(reaction conditions 40ml with 0075 mM of each NTP 1 umlRNase inhibitor 05 mM DTT and 5 uml T7 RNA polymerase[Invitrogen]) Messenger RNA was purified using peqGOLDTotal RNA kit (peqlab)

The amplified cDNA libraries prepared from platyhel-minths and Lec inermis were sequenced by GENterpriseGmbH (Mainz) or the Max Planck Institute for MolecularGenetics (Berlin) by 454 pyrosequencing using standardprotocols Illumina sequencing libraries for NemerteaGnathostomulida and Gastrotricha were prepared withdouble indices following the protocol described by Meyerand Kircher (2010) and Kircher et al (2011) starting ei-ther with totalRNA (Nemertea) or amplified mRNA(Gnathostomulida and Gastrotricha) as described by Heringet al (2012) The libraries were sequenced at the Max PlanckInstitute of Evolutionary Anthropology (Leipzig) using anIllumina Genome Analyzer IIx (GAIIx) with 76 cycles pairedend Total RNA of M hirudinaceus and amplified mRNA ofR rotatoria were sequenced using an Illumina HiSeq 2000(100 bp paired end) at the Institute of Molecular GeneticsJohannes Gutenberg University (Mainz) The sequencing li-brary of M hirudinaceus was additionally run on an IlluminaMiSeq machine (150 bp paired end) by GENterprise GmbH(Mainz) Publically available transcriptomes (ESTs and RNA-Seq) and genomic data from 49 spiralian species comple-mented these data (supplementary table S2 SupplementaryMaterial online) For the choice of outgroup taxa differentconsiderations have to be taken into account given that pla-tyzoan taxa are eventually affected by LBA First of all theoutgroup taxa should not introduce additional long branchesthemselves (Bergsten 2005) Hence distantly related out-group taxa should be avoided as well as outgroups exhibitingincreased substitution rates (Milinkovitch et al 1996 Philippeet al 2011) Therefore we used only representatives ofEcdysozoa the sister group of Spiralia and did not considernematodes and nematomorphs which are known to possesslong branches themselves Moreover more than a single out-group taxon should be used and the diversity of outgrouptaxa should be reflected (Milinkovitch et al 1996 Bergsten2005) Thus we chose representative species of priapulidskinorhynchs and pancrustaceans as it has been previouslyshown that three to four outgroup taxa are sufficient to re-solve difficult phylogenies when one also takes into accountthe computational limitations of phylogenomic studies(Rota-Stabelli and Telford 2008) Finally the properties ofthe outgroup taxa sequence data should be similar to theones of the ingroup taxa (Rota-Stabelli and Telford 2008) andin the case of LBA being more similar to short-branchedingroup taxa than to the long-branched ones The LB scoresshow that the chosen ecdsyozoan species are similar to the

short-branched spiralian taxa (fig 2) For other propertiessuch as proportion of missing data and especially basecomposition heterogeneity ecdysozoan taxa are similar tothe ingroup taxa (supplementary fig S5 SupplementaryMaterial online)

Data Assembly

Processing of M hirudinaceus and R rotatoria data wasperformed using the FastX toolkit and included trimmingof (I) 12 bp at the 50-end (II) adapter sequences and (III)low-quality bases (cutoff 25) Reads longer than 20 bp aftertrimming were sorted into intact pairs and singletons using acustom perl script and were subsequently assembled usingthe CLC Genomics Workbench 55 (CLC Bio)

For the GAIIx databases were called with IBIS 112 (Kircheret al 2009) adaptor and primer sequences removed andreads with low complexity as well as mispaired indices dis-carded Raw data of all libraries were trimmed discarding allreads with more than 5 bases below a quality score of 15 For454 pyrosequencing data sequences were thinned and qual-ity filtered as implemented by Roche In contrast to thosedata that were retrieved from the NCBI nr database (ieMoniezia expansa) as well as the genomic data present inthe lophotrochozoan core ortholog set of HaMStR (ieSchistosoma mansoni Lottia gigantea Helobdella robustaCapitella teleta and Apis mellifera) the other data were fur-ther trimmed quality-filtered and assembled as described ineither Hausdorf et al (2007) or in Riesgo et al (2012) using theCLC Genomics Workbench with 005 as the limit for thinningand the scaffolding option in the assembly

Sets of orthologous genes were determined using a profilehidden Markov model-based reciprocal hit triangulationsearch using a modified version of HaMStR version 8(Ebersberger et al 2009) (called HaMStRad and the modifiedfiles are available at httpsgithubcommptrsenHaMStRadlast accessed April 24 2014) As a core set we used theLophotrochozoa set of 1253 genes derived from theInparanoid database (httpinparanoid51sbcsuse lastaccessed April 24 2014) for the primer-taxa Cap teleta Hrobusta Lo gigantea S mansoni Daphnia pulex Ap melliferaand Caenorhabditis elegans Modifications of HaMStR in-cluded the usage of Exonerate (Slater and Birney 2005)instead of Genewise (Birney et al 2004) to provide frame-shift-corrected corresponding nucleotide sequences Weused the representative option with all primer taxa the re-laxed option and a cutoff e value of e05 Using the represen-tative option might result in the assignment of the samesequence into different sets of orthologous genes Such re-dundantly assigned sequences were removed using customperl scripts and the responsible bug in HaMStR fixed forfuture analyses Each set of orthologous genes was individuallyaligned using MAFFT-Linsi (Katoh et al 2005) followed by thedetermination of questionably aligned positions with AliScore(Kuck et al 2010) and masking with AliCut using defaultparameters The 1253 genes were concatenated into asuper-matrix using FASconCAT (Kuck and Meusemann2010) and the super-matrix was reduced based on the

1844

Struck et al doi101093molbevmsu143 MBE

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 13: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

phylogenetic signal in a gene by assessing the tree-likeness byquartet-mapping using extended geometry mapping as im-plemented in MARE (Meusemann et al 2010) We excludedthe species of the core ortholog set S mansoni Lo gigantea Hrobusta Cap teleta Dap pulex and Ap mellifera prior tomatrix reduction and used a d value of 05 generating thelarge data set d01 (supplementary fig S6 SupplementaryMaterial online)

Paralogy and Contamination Screening

The 559 genes present in data set d01 were further screenedfor paralogous sequences and contamination within single-gene data sets For this purpose a screening based on boot-strap maximum-likelihood (ML) analyses of the individualgenes (Philippe et al 2011 Struck 2013) was conductedusing TreSpEx (wwwannelidade last accessed April 242014) Initially ML analyses were conducted for the unmaskedindividual genes (supplementary fig S6 SupplementaryMaterial online) All bipartitions supported by a bootstrapvalue 95 were extracted from the resulting topologies Asa first step all bipartitions congruent with clades for whichindependent a priori evidence of monophyly exist weremasked for the following steps (Struck 2013) The columnsldquogrouprdquo and ldquosubgrouprdquo in supplementary table S2Supplementary Material online as well as genera with morethan one representative indicate these a priori clades To beconservative only sequences of bipartitions that exhibited aconflict with these a priori clades were pruned (supplemen-tary tables S4 and S5 Supplementary Material online) A con-flict in this case meant that species of an a priori clade as wellas other species were present in both groups of the biparti-tion For example Platyhelminthes was such an a priori cladeand if in a bipartition platyhelminth as well as other spiralianandor ecdysozoan species were present in both clades of thebipartition this was regarded as a strong conflict Thus therewas a strong conflict in these cases regarding the monophylyof a clade with a priori independent evidence of monophylyAt the group level all but one clade fulfilled this criterion thatis showed strong conflicts The single exception was a cladecomprising only all gnathiferan species in that data set even-tually reflecting true phylogenetic signal Previous studieshave shown that such a pattern is characteristic for phylog-enies of paralogous sequences reflecting the gene tree ratherthan the species tree (Rodrıguez-Ezpeleta et al 2007 Philippeet al 2009 2011 Struck 2013) However other sources ofartificial signal like shared missing data compositionalbiases contamination or LBA (Bergsten 2005 Lemmonet al 2009 Simmons and Freudenstein 2011 Simmons2012a 2012b Struck 2013) can also result in such a patternIn any case potentially strong misleading signal with signifi-cant BS in single gene analyses has been masked by thisprocedure

The paralogy screening was followed by a screening pro-cedure for contamination in the libraries of our studyTherefore the 18 S rRNA sequence of Lineus bilineatus(DQ279932) was blasted against each assembled library (sup-plementary fig S6 Supplementary Material online) using

BlastN and a cutoff value of e20 All detected contigs werethen blasted against the NCBI nr database using BlastN If thebest hit represented a species from a different supra-specifictaxon with the traditional rank of a phylum than the queryspecies this was taken as an indication of possible contami-nation (supplementary table S6 Supplementary Materialonline) For example for some of the contigs of theAlvinella pompejana (Annelida) library blast searches resultedin best hits linking the query sequence to the nematodTripylella sp the arthropod Ptinus fur or an uncultured acau-losporan fungus To prune eventually contaminated se-quences from the sets of 559 genes reference databaseswere specifically generated for each affected species basedon the blast results against the NCBI database For theAlvinella example a reference database consisted ofthe non-redundant proteome information retrieved fromthe genomes of Ap mellifera (Arthropoda) Cae elegans(Nematoda) Schizosaccharomyces cerevisiae (Fungi) andthe transcriptome of Dap pulex (Arthropoda) as negativereferences as well as from the genomes of Cap teleta H ro-busta (Annelida) Lo gigantea (Mollusca) and Schmidteamediterranea (Platyhelminthes) as positive references Eachof the 559 genes present for that species (eg Al pompejana)was blasted against this species-specific reference databaseThree pruning strategies were tested a sequence was prunedwhen (I) the best hit was a negative reference sequence (II)the best hit was a negative reference sequence and in additionthe E value was at least one order of a magnitude better thanthat of the best hit for a positive reference or (III) the best hitwas a negative reference sequence and in addition the E valuewas at least four orders better than that of the best hit for apositive reference As ML analyses of the data set d01 with 65taxa and 82162 amino acid positions using the three differentpruning strategies resulted in no significant differences of thetopologies inferred we chose the most conservative firstpruning strategy for subsequent analyses Custom Perl scriptswere written for all these steps

Phylogenetic Analyses

The most appropriate substitution model was LG + I + asdetermined using the ProteinModelSelection script forRAxML (Stamatakis 2006) Before the time-consumingBayesian Inference (BI) we conducted a series of ML analysesas part of the sensitivity analyses and screening procedures(see supplementary fig S6 Supplementary Material online) Intotal 1129 ML analyses were conducted with RAxML 73(Stamatakis 2006) using 300 and 100 bootstrap replicatesearches for concatenated and individual gene data sets re-spectively The bootstrap searches were followed by a searchof the best tree Preliminary analyses using the automaticbootstopping option (Pattengale et al 2009) (- autoMRE)in RAxML obtained a maximum of 240 bootstrap replicatesfor different tested concatenated data sets and hence weused 300 replicates for all analyses for reasons of comparabil-ity Moreover these preliminary analyses showed that a boot-strap search followed by a best tree search always found a treewith an equal or better likelihood score than independent

1845

Platyzoans and Spiralia doi101093molbevmsu143 MBE

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 14: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

searches for the best tree using 100 replicate searches startingfrom randomized maximum-parsimony trees

For the BI analysis we used PhyloBayes MPI 14f (Lartillotand Philippe 2004 Lartillot et al 2013) using the GTR + CATmodel and the data set d02 generated by excluding geneswith high degrees of missing data (see sensitivity analysesbelow) For the analysis four chains ran in parallel for13669 cycles on average (ranging from 12164 to 14217)Convergence of likelihood values alpha parameter and treelength of the four chains was assessed using Tracer v15(httptreebioedacuksoftwaretracer last accessed April24 2014) Upon convergence the average standard deviationof split frequencies was lt01 with a value of 0055 The first6000 cycles of each chain were discarded as burnin and themajority rule consensus tree containing the posterior proba-bilities was calculated from the remaining trees of the chainwith the best average likelihood score sampling every secondtree

Sensitivity Analyses

Leaf stability indices of species were determined usingPhyutility (Smith and Dunn 2008) and the bootstrap treesof ML analyses of the data set d01 comprising all speciessampled To assess the branch length heterogeneity weused the herein newly developed LB score using TreSpEx(wwwannelidade last accessed April 24 2014) which wealso used to calculate classical tip-to-root distances Taxawere excluded from the data sets d01ndashd10 in accordancewith these results and the phylogenetic reconstructionsrepeated

To objectively assess the branch length heterogeneity in atree we developed a new tree-based measurement which wecall the LB score The score utilizes patristic distances (PDs)that is the distance between two taxa based on the connect-ing branches and is based on the mean pairwise PD of a taxoni to all other taxa in the tree relative to the average pairwisePD over all taxa (a)

LBi frac14PDi

PDa

1

100

In specific the score measures for each taxon the percent-age deviation from the average and is independent of the rootof the tree The latter is also the reason for not using thetraditional tip-to-root distance (Bergsten 2005) When usingtip-to-root distances the recognition of long-branched taxaheavily depends on the root of the tree For example in thereconstruction of the individual gene with the ID 111427 inour analyses below the ecdysozoan outgroup species are notmonophyletic Whereas tip-to-root distances based on anApis-rooted tree and LB scores indicate the same taxa aslong-branched rooting the tree with either Echinoderes orPriapulus some of these species would be indicated asshort-branched (supplementary fig S1 SupplementaryMaterial online) Given the automatic process pipelines inphylogenomic analyses due to the vast amount of genes de-tection of long-branched taxa should be robust againstchanges in the root of the tree Moreover in the search for

the best tree in phylogenetic reconstructions only unrootedtrees are used and rooting is an a posteriori procedure Thusnotwithstanding that outgroup species might be long-branched the artificial grouping of species due to LBA inphylogenetic reconstructions is not directly due to the rootby itself (Bergsten 2005) Hence detection of LBA should beindependent of the root Fortunately either using the largedata set d01 in our analyses below or the 559 individual genesof this data set LB scores and tip-to-root distances are highlyand positively correlated with a R2 value of 091543 or anaverage R2 of 085684 respectively (supplementary fig S2Supplementary Material online)

For data partitioning we analyzed the 559 genes of thedata set d01 generated with the MARE setting ldquoall taxa in-cludedrdquo and a d value of 05 We determined both alignment-and tree-based properties Using BaCoCa (Kuck and Struck2014) the proportion of hydrophobic and polar amino acidsthe proportion of missing data as well as the compositionalheterogeneity as measured by the RCFV values (Zhong et al2011) were determined from the pruned and masked align-ments across all species in each gene (supplementary fig S6Supplementary Material online) ML trees from these align-ments were used to determine the evolutionary rate for eachgene calculated as the average pairwise PD between twospecies in the tree as well as the mean of the upper quartileof LB scores (ie the upper 25 of all LB scores) and thestandard deviation of all LB scores as measurements ofbranch length heterogeneity with the aid of TreSpEx (wwwannelidade) Correlation studies of these properties wereconducted in Excel (supplementary figs S2 and S4Supplementary Material online) Principal component analy-ses were conducted in R with scaled values (supplementaryfig S3 Supplementary Material online) The determination ofthe branch length heterogeneity within a gene was based oneither the mean of the upper quartile of LB scores or the stan-dard deviation of all LB scores within a gene However bothapproaches led to a strong linear correlation (R2 = 08363supplementary fig S4 Supplementary Material online) andthus we used solely the standard deviation of LB scores as ameasure of branch length heterogeneity in the principal com-ponent analysis Similarly the proportions of hydrophobicand polar amino acids were also strongly correlated(R2 = 06481 supplementary fig S4 Supplementary Materialonline) and hence we excluded the proportion of polaramino acids

Genes with either high degrees of missing data or high basecomposition heterogeneity were excluded based on the re-sults of heat map analyses in combination with hierarchicalclustering without scaling the values in R (Bapteste et al 2005Susko et al 2006) Four clusters of proportion of missing datawere found ranging from low to high degrees of missing data(supplementary fig S7 Supplementary Material online) Fromdata set d01 genes belonging to the groups with medium-high to high degrees of missing data were excluded to gen-erate data set d02 characterized by only low degrees of miss-ing data We also generated a data set d03 where we excludedonly high degrees of missing data from data set d01Alternatively the data set d01 was condensed using MARE

1846

Struck et al doi101093molbevmsu143 MBE

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 15: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

(Meusemann et al 2010) with d values of 10 15 or 20instead of 05 used above resulting in the data sets d04d05 and d06 respectively

For compositional heterogeneity the heatmap revealedthree clusters with low medium and high compositionalheterogeneity (supplementary fig S8 SupplementaryMaterial online) Only genes which were part of the clusterwith low compositional heterogeneity were kept for data setd07 The tree-based property branch length heterogeneitywas ranked and divided into three equal parts To generatedata set d08 only the genes from the third with the lowestheterogeneity values were not excluded (supplementaryfig S6 and table S9 Supplementary Material online)Moreover we excluded all genes from data set d01 whichwere not part of the 70 or 95 confidence interval of thefirst two principal components (supplementary fig S3Supplementary Material online) resulting in data sets d09and d10 respectively Finally for each data set we determinedthe number of single-gene trees which found a monophyleticor paraphyletic Platyzoa using custom perl scripts For thelatter at least one platyzoan taxon (ie PlatyhelminthesGastrotricha Gnathostomulida or Syndermata) had to beplaced more closely to the outgroup than at least oneother platyzoan taxon

Supplementary MaterialSupplementary tables S1ndashS13 and figures S1ndashS8 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by the priority programldquoDeep Metazoan Phylogenyrdquo of the DeutscheForschungsgemeinschaft DFG-STR 6835-2 and DFG-STR6838-1 to THS DFG-Ha21034 to TH DFG-HA 27635to BH and DFG-BL 7875-1 to CB TH wishes to thankthe Center for Computational Sciences Mainz (CSM) for ad-ditional financial support THS also acknowledges the sup-port of the Regional Computing Centre of Cologne (RRZK)by among others providing access to the HPC clusterCHEOPS for the parallel PhyloBayes analyses The authorsgratefully acknowledge Edyta Fialkowska (Institute ofEnvironmental Sciences Jagiellonian University Poland) forproviding Lecane specimens Laszlo Sugar (Faculty ofAnimal Science Kaposvar University Hungary) for collectingand providing Macracanthorhynchus specimens SanjaRamljak (Institute of Clinical Research and DevelopmentMainz Germany) for helping with collecting Seison specimensand Michael Plewka (httpplingfactoryde) for the picture ofRotaria neptunoida They also thank Birgit Nickel andMatthias Meyer (Max Planck Institute for EvolutionaryAnthropology Leipzig Germany) for their assistance in se-quencing using Illumina GAIIx and Steffen Rapp (Instituteof Molecular Genetics Johannes Gutenberg UniversityMainz) for operating the Illumina HiSeq 2000 Sequencedata have been deposited in the NCBI short read archive

All other data and trees used in this study are available atDataDryad (doi105061dryadn435p)

ReferencesAhlrichs WH 1997 Epidermal ultrastructure of Seison nebaliae and

Seison annulatus and a comparison of epidermal structureswithin Gnathifera Zoomorphology 11741ndash48

Alexe G Vijaya Satya R Seiler M Platt D Bhanot T Hui S Tanaka MLevine AJ Bhanot G 2008 PCA and clustering reveal alternatemtDNA phylogeny of N and M clades J Mol Evol 67465ndash487

Bapteste E Susko E Leigh J MacLeod D Charlebois RL Doolittle WF2005 Do orthologous gene phylogenies really support tree-thinkingBMC Evol Biol 533

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Bernt M Bleidorn C Braband A Dambach J Donath A Fritzsch GGolombek A Hadrys H Juhling F Meusemann K et al 2013 Acomprehensive analysis of bilaterian mitochondrial genomes andphylogeny Mol Phylogenet Evol 69352ndash364

Birney E Clamp M Durbin R 2004 Genewise and genomewise GenomeRes 14988ndash995

Bleidorn C 2007 The role of character loss in phylogenetic reconstruc-tion as exemplified for the Annelida J Zool Syst Evol Res 45299ndash307

Brinkman H Philippe H 2008 Animal phylogeny and large-scalesequencing progress and pitfalls J Syst Evol 46274ndash286

Cavalier-Smith T 1998 A revised six-kingdom system of life Biol Rev 73203ndash266

Chesebro JE Pueyo JI Couso JP 2013 Interplay between a Wnt-dependent organiser and the Notch segmentation clock regulatesposterior development in Periplaneta americana Biol Open 2227ndash237

Chipman AD 2010 Parallel evolution of segmentation by co-option ofancestral gene regulatory networks Bioessays 3260ndash70

Couso JP 2009 Segmentation metamerism and the Cambrian explo-sion Int J Dev Biol 531305ndash1316

Davidson EH Erwin DH 2006 Gene regulatory networks and the evo-lution of animal body plans Science 311796ndash800

De Robertis EM 2008 The molecular ancestry of segmentation mech-anisms Proc Natl Acad Sci U S A 10516411ndash16412

Doe D 1981 Comparative ultrastructure of the pharynx simplex inturbellaria Zoomorphology 97133ndash193

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash750

Ebersberger I Strauss S von Haeseler A 2009 HaMStR profile hiddenmarkov model based search for orthologs in ESTs BMC Evol Biol 9157

Edgecombe G Giribet G Dunn C Hejnol A Kristensen R Neves RRouse G Worsaae K Soslashrensen M 2011 Higher-level metazoanrelationships recent progress and remaining questions Org DiversEvol 11151ndash172

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Zool 27401ndash410

Giribet G 2008 Assembling the lophotrochozoan (=spiralian) tree oflife Philos Trans R Soc Lond B Biol Sci 3631513ndash1522

Halanych KM 2004 The new view of animal phylogeny Annu Rev EcolEvol Syst 35229ndash256

Hannibal R Patel N 2013 What is a segment Evo Devo 435Haszprunar G 1996 Plathelminthes and Plathelminthomorphamdash

paraphyletic taxa J Zool Syst Evol Res 3441ndash48Hausdorf B Helmkampf M Meyer A Witek A Herlyn H Bruchhaus I

Hankeln T Struck TH Lieb B 2007 Spiralian phylogenomics sup-ports the resurrection of Bryozoa comprising Ectoprocta andEntoprocta Mol Biol Evol 242723ndash2729

Hausdorf B Helmkampf M Nesnidal MP Bruchhaus I 2010Phylogenetic relationships within the lophophorate lineages

1847

Platyzoans and Spiralia doi101093molbevmsu143 MBE

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 16: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

(Ectoprocta Brachiopoda and Phoronida) Mol Phylogenet Evol 551121ndash1127

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Hering L Henze MJ Kohler M Kelber A Bleidorn C Leschke M Nickel BMeyer M Kircher M Sunnucks P et al 2012 Opsins inOnychophora (velvet worms) suggest a single origin and subsequentdiversification of visual pigments in arthropods Mol Biol Evol 293451ndash3458

Herlyn H Ehlers U 1997 Ultrastructure and function of the pharynxof Gnathostomula paradoxa (Gnathostomulida) Zoomorphology117135

Herlyn H Rohrig H 2003 Ultrastructure and overall organization ofligament sac uterine bell uterus and vagina in Paratenuisentis ambi-guus (Acanthocephala Eoacanthocephala)mdashthe character evolu-tion within the Acanthocephala Acta Zool 84239ndash247

Heuer C Muller C Todt C Loesel R 2010 Comparative neuroanatomysuggests repeated reduction of neuroarchitectural complexity inAnnelida Front Zool 713

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hyman LH 1951 The invertebrates Vol 2 Platyhelminthes andRhynchocoela the Acoelomate bilateria New York McGraw-Hill

Jenner RA 2004a Towards a phylogeny of the Metazoa evaluatingalternative phylogenetic positions of Platyhelminthes Nemerteaand Gnathostomulida with a critical reappraisal of cladistic charac-ters Cont Zool 733ndash163

Jenner RA 2004b When molecules and morphology clash reconcilingconflicting phylogenies of the Metazoa by considering secondarycharacter loss Evol Dev 6372ndash8

Kapp H 2000 The unique embryology of Chaetognatha Zool Anz 239263ndash266

Katoh K Kuma K-I Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kieneke A Riemann O Ahlrichs WH 2008 Novel implications for thebasal internal relationships of Gastrotricha revealed by an analysis ofmorphological characters Zool Scr 37429ndash460

Kircher M Sawyer S Meyer M 2011 Double indexing overcomes inac-curacies in multiplex sequencing on the Illumina platform NucleicAcids Res 20111ndash8

Kircher M Stenzel U Kelso J 2009 Improved base calling for theIllumina Genome Analyzer using machine learning strategiesGenome Biol 10R83

Koch M Quast B Bartolomaeus T 2014 Coeloms and nephridia inannelids and arthropods In Wagele JW Bartolomaeus T editorsDeep metazoan phylogeny the backbone of the tree of lifemdashnewinsights from analyses of molecules morphology and theory of dataanalysis Berlin (Germany) De Gruyter p 173ndash284

Kuck P Mayer C Wagele J-W Misof B 2012 Long branch effects distortMaximum Likelihood phylogenies in simulations despite selection ofthe correct model PLoS One 7e36593

Kuck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuck P Meusemann K Dambach J Thormann B von Reumont BMWagele JW Misof B 2010 Parametric and non-parametric maskingof randomness in sequence alignments can be improved and leadsto better resolved trees Front Zool 710

Kuck P Struck TH 2014 BaCoCamdasha heuristic software tool for theparallel assessment of sequence biases in hundreds of gene andtaxon partitions Mol Phylogenet Evol 7094ndash98

Kvist S Siddall ME 2013 Phylogenomics of Annelida revisited a cladisticapproach using genome-wide expressed sequence tag data miningand examining the effects of missing data Cladistics 29435ndash448

Lartillot N Brinkmann H Philippe H 2007 Suppression of long-branchattraction artefacts in the animal phylogeny using a site-heterogeneous model BMC Evol Biol 7S4

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained byMaximum Likelihood and Bayesian Inference Syst Biol 58130ndash145

Loesel R 2014 Brain complexity in protostomes In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 79ndash91

Marletaz F Martin E Perez Y Papillon D Caubit X Lowe CJ Freeman BFasano L Dossat C Wincker P 2006 Chaetognath phylogenomics aprotostome with deuterostome-like development Curr Biol 16R577ndashR578

Matus DQ Copley RR Dunn CW Hejnol A Eccleston H Halanych KMMartindale MQ Telford MJ 2006 Broad taxon and gene samplingindicate that chaetognaths are protostomes Curr Biol 16R575ndashR576

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Meyer M Kircher M 2010 Illumina sequencing library preparation forhighly multiplexed target capture and sequencing Cold Spring HarbProtoc 2010pdbprot5448

Milinkovitch MC LeDuc RG Adachi J Farnir F Georges M Hasegawa M1996 Effects of character weighting and species sampling on phy-logeny reconstruction a case study based on DNA sequence data incetaceans Genetics 1441817ndash1833

Nesnidal M Helmkampf M Meyer A Witek A Bruchhaus I EbersbergerI Hankeln T Lieb B Struck TH Hausdorf B 2013 New phyloge-nomic data support the monophyly of Lophophorata and anEctoproctndashPhoronid clade and indicate that Polyzoa andKryptrochozoa are caused by systematic bias BMC Evol Biol 13253

Nielsen C 2012 Animal evolutionmdashinterrelationships of the living phylaNew York Oxford University Press Inc

Northcutt RG 2012 Evolution of centralized nervous systems twoschools of evolutionary thought Proc Natl Acad Sci U S A 10910626ndash10633

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WEG Nickel M Schierwater B et al 2013Deep metazoan phylogeny when different genes tell different stor-ies Mol Phylogenet Evol 67223ndash233

Paps J Baguna J Riutort M 2009a Bilaterian phylogeny a broad sam-pling of 13 nuclear genes provides a new Lophotrochozoa phylogenyand supports a paraphyletic basal Acoelomorpha Mol Biol Evol 262397ndash2406

Paps J Baguna J Riutort M 2009b Lophotrochozoa internal phylogenynew insights from an up-to-date analysis of nuclear ribosomal genesProc R Soc B Biol Sci 2761245ndash1254

Pattengale ND Alipour M Bininda-Emonds ORP Moret BMEStamatakis A 2009 How many bootstrap replicates are necessaryIn Batzoglou S editor RECOMB 2009 LNCS 5541 Berlin (Germany)Springer-Verlag p 184ndash200

Perez Y Muller CHG Harzsch S 2014 The Chaetognatha an anarchistictaxon between Protostomia and Deuterostomia In Wagele JWBartolomaeus T editors Deep metazoan phylogeny the backboneof the tree of lifemdashnew insights from analyses of molecules mor-phology and theory of data analysis Berlin (Germany) De Gruyterp 49ndash77

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 93

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009

1848

Struck et al doi101093molbevmsu143 MBE

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE

Page 17: Platyzoan Paraphyly Based on Phylogenomic Data Supports …...Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa

Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Rieger RM Tyler S 1995 Sister-group relationship of Gnathostomulidaand Rotifera-Acanthocephala Invertebr Biol 114186ndash188

Riesgo A Andrade SC Sharma P Novo M Perez-Porro A Vahtera VGonzalez V Kawauchi G Giribet G 2012 Comparative descriptionof ten transcriptomes of newly sequenced invertebrates and effi-ciency estimation of genomic sampling in non-model taxa FrontZool 933

Rodrıguez-Ezpeleta N Brinkmann H Burger G Roger AJ Gray MWPhilippe H Lang BF 2007 Toward resolving the eukaryotic treethe phylogenetic positions of jakobids and cercozoans Curr Biol171420ndash1425

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Rothe B Schmidt-Rhaesa A 2009 Architecture of the nervous system intwo Dactylopodola species (Gastrotricha Macrodasyida)Zoomorphology 128227ndash246

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Simmons MP 2012a Misleading results of likelihood-based phylogeneticanalyses in the presence of missing data Cladistics 28208ndash222

Simmons MP 2012b Radical instability and spurious branch support bylikelihood when applied to matrices with non-random distributionsof missing data Mol Phylogenet Evol 62472ndash484

Simmons MP Freudenstein JV 2011 Spurious 99 bootstrap and jack-knife support for unsupported clades Mol Phylogenet Evol 61177ndash191

Slater G Birney E 2005 Automated generation of heuristics for biolog-ical sequence comparison BMC Bioinformatics 631

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Struck TH 2013 The impact of paralogy on phylogenomic studiesmdashacase study on annelid relationships PLoS One 8e62892

Struck TH 2012 Phylogeny of Annelida Handbook of ZoologyOnline [Internet] Berlin (Germany) DeGruyter [cited 2014Apr 24] Available from httpwwwdegruytercomviewZoologybp_029147-6_1

Struck TH Fisse F 2008 Phylogenetic position of Nemertea derived fromphylogenomic data Mol Biol Evol 25728ndash736

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Susko E Leigh J Doolittle WF Bapteste E 2006 Visualizing and assessingphylogenetic congruence of core gene sets a case study of thegamma-proteobacteria Mol Biol Evol 231019ndash1030

Tomer R Denes AS Tessmar-Raible K Arendt D 2010 Profiling byimage registration reveals common origin of annelid mushroombodies and vertebrate pallium Cell 142800ndash809

Turbeville JM 1986 An ultrastructural analysis of coelomogenesis in thehoplonemertine Prosorhochmus americanus and the polychaeteMagelona sp J Morphol 18751ndash60

Wey-Fabrizius AR Herlyn H Rieger B Rosenkranz D Witek A WelchDBM Ebersberger I Hankeln T 2014 Transcriptome data revealsyndermatan relationships and suggest the evolution of endopara-sitism in Acanthocephala via an epizoic stage PLoS One 9e88618

Witek A Herlyn H Ebersberger I Mark Welch DB Hankeln T 2009Support for the monophyletic origin of Gnathifera from phyloge-nomics Mol Phylogenet Evol 531037ndash1041

Zhong M Hansen B Nesnidal MP Golombek A Halanych KM StruckTH 2011 Detecting the symplesiomorphy trap a multigenephylogenetic analysis for terebelliform annelids BMC Evol Biol 11369

1849

Platyzoans and Spiralia doi101093molbevmsu143 MBE