phylogenomic interrogation of arachnida reveals systemic

22
Article Phylogenomic Interrogation of Arachnida Reveals Systemic Conflicts in Phylogenetic Signal Prashant P. Sharma,* ,1 Stefan T. Kaluziak, 2 Alicia R. P erez-Porro, 3 Vanessa L. Gonz alez, 3 Gustavo Hormiga, 4 Ward C. Wheeler, 1 and Gonzalo Giribet 3 1 Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 2 Marine Science Center, Northeastern University 3 Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University 4 Department of Biological Sciences, George Washington University *Corresponding author: E-mail: [email protected]. Associate editor: Nicolas Vidal Abstract Chelicerata represents one of the oldest groups of arthropods, with a fossil record extending to the Cambrian, and is sister group to the remaining extant arthropods, the mandibulates. Attempts to resolve the internal phylogeny of chelicerates have achieved little consensus, due to marked discord in both morphological and molecular hypotheses of chelicerate phylogeny. The monophyly of Arachnida, the terrestrial chelicerates, is generally accepted, but has garnered little support from molecular data, which have been limited either in breadth of taxonomic sampling or in depth of sequencing. To address the internal phylogeny of this group, we employed a phylogenomic approach, generating transcriptomic data for 17 species in combination with existing data, including two complete genomes. We analyzed multiple data sets contain- ing up to 1,235,912 sites across 3,644 loci, using alternative approaches to optimization of matrix composition. Here, we show that phylogenetic signal for the monophyly of Arachnida is restricted to the 500 slowest-evolving genes in the data set. Accelerated evolutionary rates in Acariformes, Pseudoscorpiones, and Parasitiformes potentially engender long- branch attraction artifacts, yielding nonmonophyly of Arachnida with increasing support upon incrementing the number of concatenated genes. Mutually exclusive hypotheses are supported by locus groups of variable evolutionary rate, revealing significant conflicts in phylogenetic signal. Analyses of gene-tree discordance indicate marked incongru- ence in relationships among chelicerate orders, whereas derived relationships are demonstrably robust. Consistently recovered and supported relationships include the monophyly of Chelicerata, Euchelicerata, Tetrapulmonata, and all orders represented by multiple terminals. Relationships supported by subsets of slow-evolving genes include Ricinulei + Solifugae; a clade comprised of Ricinulei, Opiliones, and Solifugae; and a clade comprised of Tetrapulmonata, Scorpiones, and Pseudoscorpiones. We demonstrate that outgroup selection without regard for branch length distribution exacerbates long-branch attraction artifacts and does not mitigate gene-tree discordance, regardless of high gene representation for outgroups that are model organisms. Arachnopulmonata (new name) is proposed for the clade comprising Scorpiones + Tetrapulmonata (previously named Pulmonata). Key words: Arthropoda, arachnids, concatenation, topological conflict, transcriptomics, orthology prediction, ancient rapid radiation. Introduction Subsequent to decades of pitched contention over the inter- relationships of Arthropoda, a consensus has begun to emerge regarding the shape of the arthropod tree of life. With the advent of transcriptome- and genome-scale data, the monophyly of Arthropoda and its three major rami— Chelicerata, Myriapoda, and Pancrustacea (alternatively, “Tetraconata”)—is strongly supported, together with a sister group relationship of arthropods and velvet worms (e.g., Hejnol et al. 2009; Campbell et al. 2011; Rota-Stabelli et al. 2011). Multiple phylogenomic efforts have been directed toward resolving internal relationships of Pancrustacea, which unite hexapods and the paraphyletic “crustaceans,” and have contributed to the identification of the hexapod sister group (Regier et al. 2010; Meusemann et al. 2010; Simon et al. 2012; von Reumont et al. 2012). The interrelationships of Chelicerata remain among the last major challenges for ar- thropod systematics. Chelicerates constitute the second larg- est branch of the arthropod tree of life and include several iconic lineages, such as horseshoe crabs, spiders, and scorpions. Morphological phylogenetic analyses have divided extant Chelicerata into three groups: Pycnogonida (sea spiders), Xiphosura (horseshoe crabs), and Arachnida (terrestrial che- licerates) (e.g., Weygoldt and Paulus 1979; Shultz 1990, 2007). Implicit in this schema is the minimization of terrestrialization events, namely by the arachnid common ancestor. The mutual monophyly of Pycnogonida and Euchelicerata (=Xiphosura + Arachnida) is presently supported by a number of morphological characters, embryological evidence, ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 31(11):2963–2984 doi:10.1093/molbev/msu235 Advance Access publication August 8, 2014 2963 at American Museum of Natural History on February 17, 2015 http://mbe.oxfordjournals.org/ Downloaded from

Upload: others

Post on 12-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenomic Interrogation of Arachnida Reveals Systemic

Article

Phylogenomic Interrogation of Arachnida Reveals SystemicConflicts in Phylogenetic SignalPrashant P Sharma1 Stefan T Kaluziak2 Alicia R Perez-Porro3 Vanessa L Gonzalez3

Gustavo Hormiga4 Ward C Wheeler1 and Gonzalo Giribet3

1Division of Invertebrate Zoology American Museum of Natural History New York NY2Marine Science Center Northeastern University3Department of Organismic and Evolutionary Biology Museum of Comparative Zoology Harvard University4Department of Biological Sciences George Washington University

Corresponding author E-mail psharmaamnhorg

Associate editor Nicolas Vidal

Abstract

Chelicerata represents one of the oldest groups of arthropods with a fossil record extending to the Cambrian and is sistergroup to the remaining extant arthropods the mandibulates Attempts to resolve the internal phylogeny of chelicerateshave achieved little consensus due to marked discord in both morphological and molecular hypotheses of cheliceratephylogeny The monophyly of Arachnida the terrestrial chelicerates is generally accepted but has garnered little supportfrom molecular data which have been limited either in breadth of taxonomic sampling or in depth of sequencing Toaddress the internal phylogeny of this group we employed a phylogenomic approach generating transcriptomic data for17 species in combination with existing data including two complete genomes We analyzed multiple data sets contain-ing up to 1235912 sites across 3644 loci using alternative approaches to optimization of matrix composition Here weshow that phylogenetic signal for the monophyly of Arachnida is restricted to the 500 slowest-evolving genes in the dataset Accelerated evolutionary rates in Acariformes Pseudoscorpiones and Parasitiformes potentially engender long-branch attraction artifacts yielding nonmonophyly of Arachnida with increasing support upon incrementing thenumber of concatenated genes Mutually exclusive hypotheses are supported by locus groups of variable evolutionaryrate revealing significant conflicts in phylogenetic signal Analyses of gene-tree discordance indicate marked incongru-ence in relationships among chelicerate orders whereas derived relationships are demonstrably robust Consistentlyrecovered and supported relationships include the monophyly of Chelicerata Euchelicerata Tetrapulmonata and allorders represented by multiple terminals Relationships supported by subsets of slow-evolving genes includeRicinulei + Solifugae a clade comprised of Ricinulei Opiliones and Solifugae and a clade comprised ofTetrapulmonata Scorpiones and Pseudoscorpiones We demonstrate that outgroup selection without regard forbranch length distribution exacerbates long-branch attraction artifacts and does not mitigate gene-tree discordanceregardless of high gene representation for outgroups that are model organisms Arachnopulmonata (new name) isproposed for the clade comprising Scorpiones + Tetrapulmonata (previously named Pulmonata)

Key words Arthropoda arachnids concatenation topological conflict transcriptomics orthology prediction ancientrapid radiation

IntroductionSubsequent to decades of pitched contention over the inter-relationships of Arthropoda a consensus has begun toemerge regarding the shape of the arthropod tree of lifeWith the advent of transcriptome- and genome-scale datathe monophyly of Arthropoda and its three major ramimdashChelicerata Myriapoda and Pancrustacea (alternativelyldquoTetraconatardquo)mdashis strongly supported together with asister group relationship of arthropods and velvet worms(eg Hejnol et al 2009 Campbell et al 2011 Rota-Stabelliet al 2011) Multiple phylogenomic efforts have been directedtoward resolving internal relationships of Pancrustacea whichunite hexapods and the paraphyletic ldquocrustaceansrdquo and havecontributed to the identification of the hexapod sister group(Regier et al 2010 Meusemann et al 2010 Simon et al 2012

von Reumont et al 2012) The interrelationships ofChelicerata remain among the last major challenges for ar-thropod systematics Chelicerates constitute the second larg-est branch of the arthropod tree of life and include severaliconic lineages such as horseshoe crabs spiders andscorpions

Morphological phylogenetic analyses have divided extantChelicerata into three groups Pycnogonida (sea spiders)Xiphosura (horseshoe crabs) and Arachnida (terrestrial che-licerates) (eg Weygoldt and Paulus 1979 Shultz 1990 2007)Implicit in this schema is the minimization of terrestrializationevents namely by the arachnid common ancestor Themutual monophyly of Pycnogonida and Euchelicerata(=Xiphosura + Arachnida) is presently supported by anumber of morphological characters embryological evidence

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom

Mol Biol Evol 31(11)2963ndash2984 doi101093molbevmsu235 Advance Access publication August 8 2014 2963

at Am

erican Museum

of Natural H

istory on February 17 2015httpm

beoxfordjournalsorgD

ownloaded from

and molecular sequence data (Wheeler et al 1993 Wheelerand Hayashi 1998 Mallatt et al 2004 Regier et al 2010) in-cluding phylogenomic approaches (eg Roeding et al 2009Meusemann et al 2010) In contrast due to extensive mor-phological character conflict basal relationships of a puta-tively monophyletic Arachnida have remained recalcitrantto resolution and many hypotheses have been proposedfor these interrelationships (eg Shultz 1990 2007 Wheelerand Hayashi 1998 Giribet et al 2002) (fig 1) For examplemites (Acariformes) and ticks (Parasitiformes) are tradition-ally united as Acari by such characters as the subcapitulumand the ldquohexapod larvardquo (a six-legged postembryonic stagewhich also occurs in Ricinulei) this relationship conflicts withthe presence of a prominent sejugal furrow in camel spiders(Solifugae) and mites (Shultz 2007 Dunlop et al 2012) as wellas the fundamentally divergent morphology of the two con-stituent lineages of Acari (Van der Hammen 1989) Someauthors disputed even the monophyly of Arachnida on thebasis of ldquoprimitiverdquo characters occurring in scorpions andOpiliones (harvestmen) (eg Van der Hammen 1989)

Paralleling morphology molecular data have not resolvedrelationships within Arachnida either much less affirmed thevalidity of this group (fig 1) Analyses of nucleotide sequencedata infrequently recover the monophyly of Arachnida al-though very few studies have comprehensively sampled allextant arachnid orders (Wheeler and Hayashi 1998 Giribetet al 2002 Masta et al 2009 2010 Roeding et al 2009

Meusemann et al 2010 Pepato et al 2010 Regier et al2010 Arabi et al 2012 von Reumont et al 2012) The largestphylogenetic effort sampling all extant arachnid lineagesbased on 62 Sanger-sequenced loci recovered strong supportonly for the monophyly of Euchelicerata Tetrapulmonata(Araneae Amblypygi and Uropygi) and relationshipswithin the tetrapulmonates Arachnida was modestly sup-ported in that study by only two of four analyses and basalrelationships of arachnids were highly unstable (Regier et al2010)

Multiple aspects of Arachnida make the internal resolutionof this group prone to recalcitrance First the origins of che-licerates are presumed to be ancient occurring during or priorto the Cambrian with rapid diversification of extant arachnidorders (Dunlop 2010 Rota-Stabelli et al 2013) A considerablenumber of arachnid lineages equal in rank to orders has goneextinct such as trigonotarbids (early arachnids present by theSilurian) and phalangiotarbids (arachnids of unknown affinitypresent in the Devonian and Carboniferous) in addition tonon-arachnid chelicerate groups such as eurypterids andchasmataspidids (sea scorpions) and synziphosurines(stem-group horseshoe crabs or stem-group euchelicerates)(Dunlop 2010 Briggs et al 2012 Lamsdell 2013) Many extantorders also demonstrate the signature of prolonged historicextinction that is marked spans of time between origin anddiversification and the retention of few extant species Thisphenomenon is epitomized by horseshoe crabs which

FIG 1 (AndashJ) Exemplars of chelicerate diversity (A) Anoplodactylus evansi (Pycnogonida) (B) Limulus polyphemus (Xiphosura) spawning (C) Pachylicusacutus (Opiliones) (D) Neoscona arabesca (Araneae) (E) Centruroides sculpturatus (Scorpiones) (F) Ricinoides atewa (Ricinulei) (G) Eremobates sp(Solifugae) consuming a smaller conspecific (H) Synsphyronus apimelus (Pseudoscorpiones) (I) Mastigoproctus giganteus (Thelyphonida) (J) Damonvariegatus (Amblypygi) (KndashN) Hypotheses of chelicerate ordinal relationships from (K) Wheeler and Hayashi (1998) (morphology and ribosomalsequence data) (L) Giribet et al (2002) (two ribosomal genes) (M) Shultz (2007) (morphology) and (N) Regier et al (2010) (62 nuclear genes analysiswith degenerated codons) Image of Anoplodactylus evansi courtesy of Mick Harris Image of Limulus polyphemus courtesy of Peter Funch

2964

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

appeared in the fossil record in the Ordovician and are sur-vived by just four extant species that diverged in theCretaceous (Obst et al 2012) The combination of ancientdiversification and subsequent extinctions has presumablyproduced short internodes in extant chelicerate phylogenya documented affliction for phylogenetic reconstruction(Rokas and Carroll 2006 Salichos and Rokas 2013)

Second difficulty of phylogenetic resolution is com-pounded by accelerated evolution in certain arachnidorders Specifically mitochondrial genes of PseudoscorpionesAcariformes and Parasitiformes (or at least Mesostigmata)have demonstrably higher rates of evolution and larger var-iability in nucleotide composition than other Euchelicerata(Arabi et al 2012) Similarly branch lengths inferred fromnuclear ribosomal gene alignments suggest heterotachy inthe same groups (as well as Palpigradi and some araneo-morph spiders) even upon accounting for secondarystructure (Pepato et al 2010) Significantly accelerated ratesof evolution in certain lineages underlie long-branch attrac-tion (LBA) the artifactual close relationship of lineages withlong-branch lengths (Felsenstein 1978 Huelsenbeck 1997Bergsten 2005 Philippe et al 2005) Because outgroup taxaare often undersampled the placement of long-branch taxasister to or part of a grade with outgroup terminals is symp-tomatic of LBA artifacts Application of sequence data atgenomic scales compounded by inclusion of relatively fewterminals can result in maximal support values for spuriousplacement of long-branch terminals (reviewed by Bergsten2005)

A third and final aspect of arachnid biology hindering phy-logenetic resolution may be variability of genome size At oneextreme the genome of the mite Tetranychus urticae isamong the smallest arthropod genomes comprising approx-imately 18224 coding genes (Grbic et al 2011) In contrastthe genome of the scorpion Mesobuthus martensii containsapproximately 32016 genes more than any other animalgenome (Cao et al 2013) For reference approximately15771 genes occur in the fruit fly Drosophila melanogaster(FlyBase Consortium Release 5) and approximately 21000ndash25000 in Homo sapiens (Genome Reference Consortium)Furthermore whereas genome compaction in mites is asso-ciated with gene loss (Grbic et al 2011) scorpions appear tohave undergone high gene family turnover as well as geneduplications and neofunctionalization of resulting paralogs(Cao et al 2013 Sharma et al 2014) Such marked historicaldisparity in the evolution of gene families across arachnidsmay have hindered accurate assessments of orthology in datasets composed of Sanger-sequenced nuclear genes particu-larly some commonly utilized phylogenetic markers assumedto be single-copy orthologs throughout arthropods (egHedin et al 2010) but subsequently shown to contain nu-merous paralogs using cloning techniques andor transcrip-tomic resources (Riesgo et al 2012 Clouse et al 2013)

Toward improving resolution within Arachnida we com-piled a data set of 48 taxa including new transcriptomiclibraries for 17 species published herein two complete ge-nomes (the mite T urticae and the tick Ixodes scapularis)and previously published expressed sequence tag (EST) and

Illumina platform libraries Sampled outgroup taxa toArachnida consisted of Xiphosura and Pycnogonida as wellas basally branching Mandibulata (myriapods) andOnychophora We used recently developed tools for orthol-ogy prediction and optimization of matrix composition Thelargest data set of 3644 genes and greater than 12 millionamino acid sites (at 25 gene occupancy) included exemplarsof all extant arachnid orders except for Schizomida (the sistergroup of Thelyphonida and part of Uropygi) Here we showthat phylogenetic analysis of the 3644-ortholog data set yieldsthe nonmonophyly of Arachnida with 100 bootstrap sup-port due to suspect placement of PseudoscorpionesAcariformes and Parasitiformes We demonstrate that thisresult is an LBA artifact by examining nodal support as afunction of concatenation by evolutionary rate Intriguinglya set of genes supporting the monophyly of Arachnidastrongly conflicts with other such sets which support otherhypotheses also consistent with morphology (eg the place-ment of Pseudoscorpiones) indicating nonuniformity of phy-logenetic signal Significant incongruence among gene treescorroborates the pervasiveness of phylogenetic conflict at thebase of Arachnida Nevertheless we demonstrate consistentsupport for the monophyly of Chelicerata Euchelicerata andseveral derived clades of arachnids

Results

Orthology Assignment and Supermatrix Assembly

We sequenced cDNA from 16 arachnid and one xiphosuranspecies using the Illumina GAII and HiSeq 2500 platforms andcombined these with published transcriptomes and genomesfrom 30 other species including some libraries sequenced inour laboratories (Riesgo et al 2012 Fernandez et al 2014)Assembly statistics for new libraries as well as published tran-scriptomes assembled from raw reads are provided in table 1Subsequent to open reading frame (ORF) prediction and se-lection of the longest ORFs per unigene 3055ndash18814 peptidesequences were retained for Illumina libraries and completegenomes 815ndash1127 for 454 Life Science platform librariesand 54ndash10646 for Sanger-sequenced EST librariesOrthology assessment of this 48-taxon data set with theOrthologous MAtrix (OMA) stand-alone algorithm recovered77774 orthogroups Subsequent to alignment and culling ofambiguously aligned positions a supermatrix was con-structed by retaining a minimum gene occupancy thresholdof 13 taxa (ie occupancy of 425 of the data set) resultingin 3644 concatenated orthologs (1235912 aligned sites3623 occupied) The number of orthologs per taxonvaried from 6 (the palpigrade Prokoenenia wheeleri basedon published Sanger data) to 3215 (the harvestmanSitalcina lobata) Due to the paucity of data for the palpigradeits phylogenetic placement was analyzed separately

Maximum-Likelihood Analyses

Phylogenetic analysis of the 3644-ortholog matrix under max-imum likelihood (ML) recovered a fully resolved tree topologywith 100 bootstrap support for the monophyly ofChelicerata and Euchelicerata (fig 2) Contrary to the

2965

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Table 1 Species Sequenced and Analyzed in This Study

Order Source Raw Reads ContigsESTsGenes Peptides Filtered Peptides

Onychophora

Epiperipatus sp GenBank (Sanger ESTs) mdash 1868 698 480

Peripatopsis sedgwicki GenBank (454) mdash 10476 5489 2552

Peripatopsis capensis De novo (Illumina HiSeq) 40774042 92516 17452 12846

Myriapoda

Archispirostreptus gigas Diplopoda GenBank (454) mdash 4008 1237 815

Alipes grandidieri Chilopoda GenBank (Illumina GAII) 32332034 147732 34505 18814

Scutigera coleoptrata Chilopoda GenBank (Sanger ESTs) mdash 2400 961 570

Pycnogonida

Anoplodactylus eroticus Pycnogonida GenBank (454) mdash 3744 2291 1127

Endeis spinosa Pycnogonida GenBank (454) mdash 4069 3302 2656

Xiphosura

Limulus polyphemus Xiphosura De novo (Illumina HiSeq) 65099444 110362 33041 17824

Carcinoscorpius rotundicauda Xiphosura GenBank (Sanger ESTs) mdash 575 202 199

Arachnida

Damon variegatus Amplypygi De novo (Illumina GAII) 64733221 64715 19686 11823

Acanthoscurria gomesiana Araneae GenBank (Sanger ESTs) mdash 6790 2203 1567

Liphistius malayanus Araneae De novo (Illumina HiSeq) 62897982 61775 16526 11221

Frontinella communis Araneae De novo (Illumina HiSeq) 66183160 155930 53936 18978

Leucauge venusta Araneae De novo (Illumina HiSeq) 49301974 189630 62318 17591

Neoscona arabesca Araneae De novo (Illumina HiSeq) 57103328 150856 43252 16594

Blomia tropicalis Acariformes GenBank (Sanger ESTs) mdash 1432 925 888

Glycyphagus domesticus Acariformes GenBank (Sanger ESTs) mdash 2589 1309 1303

Suidasia medadensis Acariformes GenBank (Sanger ESTs) mdash 3585 2323 1766

Tetranychus urticae Acariformes GenBank (whole genome) mdash 18313 18313 14276

Amblyomma americanum Parasitiformes GenBank (Sanger ESTs) mdash 6502 1266 971

Amblyomma variegatum Parasitiformes GenBank (Sanger ESTs) mdash 3992 2248 1419

Dermacentor andersoni Parasitiformes GenBank (Sanger ESTs) mdash 3994 614 452

Ixodes scapularis Parasitiformes GenBank (whole genome) mdash 20473 20473 15288

Rhipicephalus appendiculatus Parasitiformes GenBank (Sanger ESTs) mdash 19123 10778 6303

Rhipicephalus microplus Parasitiformes GenBank (Sanger ESTs) mdash 52902 20939 10646

Larifuga sp Opiliones De novo (Illumina HiSeq) 38354659 115648 23750 10510

Metabiantes sp Opiliones De novo (Illumina HiSeq) 72266554 85203 30754 12843

Metasiro americanus Opiliones GenBank (Illumina GAII) mdash 83792 30041 16556

Pachylicus acutus Opiliones De novo (Illumina HiSeq) 18681026 69803 23379 14202

Phalangium opilio Opiliones De novo (Illumina GAII) 16225145 131889 40234 15277

Vonones ornata Opiliones De novo (Illumina HiSeq) 66096788 114248 37076 19208

Leiobunum verrucosum Opiliones GenBank (Illumina HiSeq) mdash 42636 26407 13329

Protolophus singularis Opiliones GenBank (Illumina HiSeq) mdash 287150 63477 13987

Ortholasma coronadense Opiliones GenBank (Illumina HiSeq) mdash 45807 23846 10780

Trogulus martensi Opiliones GenBank (Illumina HiSeq) mdash 55077 23233 12765

Hesperonemastoma modestum Opiliones GenBank (Illumina HiSeq) mdash 55565 32409 8845

Sclerobunus nondimorphicus Opiliones GenBank (Illumina HiSeq) mdash 56727 26599 11518

Sitalcina lobata Opiliones GenBank (Illumina HiSeq) mdash 64104 26410 14828

Siro boyerae Opiliones GenBank (Illumina HiSeq) mdash 38618 18657 11387

Prokoenenia wheeleri Palpigradi GenBank (Sanger ESTs) mdash 56 54 54

Synsphyronus apimelus Pseudoscorpiones De novo (Illumina HiSeq) 65443655 103556 31567 17820

Pseudocellus pearsei Ricinulei De novo (Illumina HiSeq) 91403481 29077 7687 5922

Ricinoides atewa Ricinulei De novo (Illumina GAII) 52766585 97327 31331 14324

Centruroides vittatus Scorpiones De novo (Illumina HiSeq) 45691843 14215 3859 3055

Pandinus imperator Scorpiones GenBank (454) mdash 17253 2219 2209

Eremobates sp Solifugae De novo (Illumina GAII) 86794767 94687 25474 11765

Mastigoproctus giganteus Uropygi De novo (Illumina GAII) 25983006 116600 32827 17674

2966

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

traditional basal split within Euchelicerata between Xiphosuraand the terrestrial Arachnida the ML analysis supported theparaphyly of Arachnida with Xiphosura sister to RicinuleiThe clade Pseudoscorpiones + Acariformes was recoveredsister to the remaining Euchelicerata this clade in turnformed part of a grade with Acariformes with respect tothe remaining Euchelicerata and Xiphosura All orders repre-sented by multiple terminals were recovered as monophyleticwith 100 bootstrap support often in spite of marked dis-parities in gene representation The greatest such disparity isexemplified by the Illumina-sequenced library of Limulus poly-phemus which has over 2 orders of magnitude greater rep-resentation than the Sanger-sequenced EST library ofCarcinoscorpius rotundicauda (2431 vs 14 orthologs respec-tively) support for Xiphosura was nevertheless maximal Allnodes in this topology were fully supported excepting thetwo corresponding to sister relationships of two sets of con-generic ticks (Amblyomma and Rhipicephalus) Longerbranch lengths were observed for the orders AcariformesParasitiformes and Pseudoscorpiones with respect to theremaining taxa

Matrix Reduction and Concatenation by GeneOccupancy

To assess the impact of heterogeneous levels of missing datawe generated three supermatrices Two of these weregenerated using the MARE algorithm retaining either all 47terminals (516 orthologs 122436 sites 3999 occupiedcall handle ldquoMARE1rdquo) or only the 30 librariescomprising Illumina data sets and whole genomes (1237orthologs 373349 sites 6249 occupied call handleldquoMARE2rdquo) A third matrix was generated by removing fromour original 3644-ortholog alignment all terminals with smal-ler libraries that is the same number of taxa as MARE2 (3644orthologs 1235912 sites 5534 occupied call handle ldquoIL-GENrdquo)

In spite of limited topological differences none ofthese three matrices recovered the monophyly ofArachnida (fig 3 and supplementary fig S1 SupplementaryMaterial online) In the two 30-taxon matrices (MARE2and IL-GEN) L polyphemus was consistently recoveredas nested within a paraphyletic Arachnida with strongsupport (bootstrap resampling frequency [BS] 98)

FIG 2 Relationships of Arachnida inferred from ML analysis of 3644 orthologs Numbers on nodes indicate bootstrap resampling frequencies (asterisksindicate a value of 100) Uppercase text indicates terminals represented by complete genomes boldface lowercase text indicates terminals with noveltranscriptomic data from this study Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

2967

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

In all three topologies Parasitiformes and the cladePseudoscorpiones + Acariformes formed a grade sister tothe remaining Euchelicerata and together with Solifugae inIL-GEN (a topology congruent with the supermatrix topologyin fig 2)

To infer the impact of gene occupancy on phylogeneticsignal another six matrices were constructed wherein thethreshold of gene occupancy was serially increased by fivetaxa up to representation for a minimum of 35 out of 47taxa A summary of matrix composition and amount of miss-ing data for these six matrices is provided in table 2 Bootstrapresampling frequency for various topological hypotheses ofinterest showed no consistent effect of missing data (fig 4)

Support for Arachnida was equal to or near zero for all ma-trices except for one which enforced gene occupancy as rep-resentation in at least 30 taxa (table 2 fig 4) The ML treetopology based on this exceptional matrix (98 orthologs16594 sites 6820 occupancy call handle ldquoAL30rdquo) recoveredthe monophyly of Arachnida with moderate support(BS = 79 supplementary fig S2 Supplementary Materialonline)

Concatenation by Percent Pairwise Identity

To assess how nodal support for alternative hypotheses ofchelicerate relationships is affected by rate of molecular evo-lution a series of 15 matrices was assembled wherein

FIG 3 Left Cladogram of arachnid relationships inferred from ML analysis post-matrix reduction retaining only terminals with Illumina-sequencedtranscriptomes or complete genomes Right Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger-sequenced EST and 454 library data from the 3644-ortholog supermatrix Orders of interest are indicated with shading Numbers on nodes indicatebootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

Table 2 Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold

Bootstrap Resampling Frequency

Gene Occupancy Threshold ProportionMissingGaps

Number ofOrthologs (sites)

Chelicerata Euchelicerata Arachnida

At least 13 taxa 0638 3644 (1235912) 100 100 0

At least 15 taxa 0606 2806 (896210) 100 100 0

At least 20 taxa 0520 1152 (312646) 100 100 0

At least 25 taxa 0448 352 (77516) 100 100 0

At least 30 taxa 0318 98 (16594) 99 100 79

At least 35 taxa 0249 23 (4021) 72 100 1

2968

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 2: Phylogenomic Interrogation of Arachnida Reveals Systemic

and molecular sequence data (Wheeler et al 1993 Wheelerand Hayashi 1998 Mallatt et al 2004 Regier et al 2010) in-cluding phylogenomic approaches (eg Roeding et al 2009Meusemann et al 2010) In contrast due to extensive mor-phological character conflict basal relationships of a puta-tively monophyletic Arachnida have remained recalcitrantto resolution and many hypotheses have been proposedfor these interrelationships (eg Shultz 1990 2007 Wheelerand Hayashi 1998 Giribet et al 2002) (fig 1) For examplemites (Acariformes) and ticks (Parasitiformes) are tradition-ally united as Acari by such characters as the subcapitulumand the ldquohexapod larvardquo (a six-legged postembryonic stagewhich also occurs in Ricinulei) this relationship conflicts withthe presence of a prominent sejugal furrow in camel spiders(Solifugae) and mites (Shultz 2007 Dunlop et al 2012) as wellas the fundamentally divergent morphology of the two con-stituent lineages of Acari (Van der Hammen 1989) Someauthors disputed even the monophyly of Arachnida on thebasis of ldquoprimitiverdquo characters occurring in scorpions andOpiliones (harvestmen) (eg Van der Hammen 1989)

Paralleling morphology molecular data have not resolvedrelationships within Arachnida either much less affirmed thevalidity of this group (fig 1) Analyses of nucleotide sequencedata infrequently recover the monophyly of Arachnida al-though very few studies have comprehensively sampled allextant arachnid orders (Wheeler and Hayashi 1998 Giribetet al 2002 Masta et al 2009 2010 Roeding et al 2009

Meusemann et al 2010 Pepato et al 2010 Regier et al2010 Arabi et al 2012 von Reumont et al 2012) The largestphylogenetic effort sampling all extant arachnid lineagesbased on 62 Sanger-sequenced loci recovered strong supportonly for the monophyly of Euchelicerata Tetrapulmonata(Araneae Amblypygi and Uropygi) and relationshipswithin the tetrapulmonates Arachnida was modestly sup-ported in that study by only two of four analyses and basalrelationships of arachnids were highly unstable (Regier et al2010)

Multiple aspects of Arachnida make the internal resolutionof this group prone to recalcitrance First the origins of che-licerates are presumed to be ancient occurring during or priorto the Cambrian with rapid diversification of extant arachnidorders (Dunlop 2010 Rota-Stabelli et al 2013) A considerablenumber of arachnid lineages equal in rank to orders has goneextinct such as trigonotarbids (early arachnids present by theSilurian) and phalangiotarbids (arachnids of unknown affinitypresent in the Devonian and Carboniferous) in addition tonon-arachnid chelicerate groups such as eurypterids andchasmataspidids (sea scorpions) and synziphosurines(stem-group horseshoe crabs or stem-group euchelicerates)(Dunlop 2010 Briggs et al 2012 Lamsdell 2013) Many extantorders also demonstrate the signature of prolonged historicextinction that is marked spans of time between origin anddiversification and the retention of few extant species Thisphenomenon is epitomized by horseshoe crabs which

FIG 1 (AndashJ) Exemplars of chelicerate diversity (A) Anoplodactylus evansi (Pycnogonida) (B) Limulus polyphemus (Xiphosura) spawning (C) Pachylicusacutus (Opiliones) (D) Neoscona arabesca (Araneae) (E) Centruroides sculpturatus (Scorpiones) (F) Ricinoides atewa (Ricinulei) (G) Eremobates sp(Solifugae) consuming a smaller conspecific (H) Synsphyronus apimelus (Pseudoscorpiones) (I) Mastigoproctus giganteus (Thelyphonida) (J) Damonvariegatus (Amblypygi) (KndashN) Hypotheses of chelicerate ordinal relationships from (K) Wheeler and Hayashi (1998) (morphology and ribosomalsequence data) (L) Giribet et al (2002) (two ribosomal genes) (M) Shultz (2007) (morphology) and (N) Regier et al (2010) (62 nuclear genes analysiswith degenerated codons) Image of Anoplodactylus evansi courtesy of Mick Harris Image of Limulus polyphemus courtesy of Peter Funch

2964

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

appeared in the fossil record in the Ordovician and are sur-vived by just four extant species that diverged in theCretaceous (Obst et al 2012) The combination of ancientdiversification and subsequent extinctions has presumablyproduced short internodes in extant chelicerate phylogenya documented affliction for phylogenetic reconstruction(Rokas and Carroll 2006 Salichos and Rokas 2013)

Second difficulty of phylogenetic resolution is com-pounded by accelerated evolution in certain arachnidorders Specifically mitochondrial genes of PseudoscorpionesAcariformes and Parasitiformes (or at least Mesostigmata)have demonstrably higher rates of evolution and larger var-iability in nucleotide composition than other Euchelicerata(Arabi et al 2012) Similarly branch lengths inferred fromnuclear ribosomal gene alignments suggest heterotachy inthe same groups (as well as Palpigradi and some araneo-morph spiders) even upon accounting for secondarystructure (Pepato et al 2010) Significantly accelerated ratesof evolution in certain lineages underlie long-branch attrac-tion (LBA) the artifactual close relationship of lineages withlong-branch lengths (Felsenstein 1978 Huelsenbeck 1997Bergsten 2005 Philippe et al 2005) Because outgroup taxaare often undersampled the placement of long-branch taxasister to or part of a grade with outgroup terminals is symp-tomatic of LBA artifacts Application of sequence data atgenomic scales compounded by inclusion of relatively fewterminals can result in maximal support values for spuriousplacement of long-branch terminals (reviewed by Bergsten2005)

A third and final aspect of arachnid biology hindering phy-logenetic resolution may be variability of genome size At oneextreme the genome of the mite Tetranychus urticae isamong the smallest arthropod genomes comprising approx-imately 18224 coding genes (Grbic et al 2011) In contrastthe genome of the scorpion Mesobuthus martensii containsapproximately 32016 genes more than any other animalgenome (Cao et al 2013) For reference approximately15771 genes occur in the fruit fly Drosophila melanogaster(FlyBase Consortium Release 5) and approximately 21000ndash25000 in Homo sapiens (Genome Reference Consortium)Furthermore whereas genome compaction in mites is asso-ciated with gene loss (Grbic et al 2011) scorpions appear tohave undergone high gene family turnover as well as geneduplications and neofunctionalization of resulting paralogs(Cao et al 2013 Sharma et al 2014) Such marked historicaldisparity in the evolution of gene families across arachnidsmay have hindered accurate assessments of orthology in datasets composed of Sanger-sequenced nuclear genes particu-larly some commonly utilized phylogenetic markers assumedto be single-copy orthologs throughout arthropods (egHedin et al 2010) but subsequently shown to contain nu-merous paralogs using cloning techniques andor transcrip-tomic resources (Riesgo et al 2012 Clouse et al 2013)

Toward improving resolution within Arachnida we com-piled a data set of 48 taxa including new transcriptomiclibraries for 17 species published herein two complete ge-nomes (the mite T urticae and the tick Ixodes scapularis)and previously published expressed sequence tag (EST) and

Illumina platform libraries Sampled outgroup taxa toArachnida consisted of Xiphosura and Pycnogonida as wellas basally branching Mandibulata (myriapods) andOnychophora We used recently developed tools for orthol-ogy prediction and optimization of matrix composition Thelargest data set of 3644 genes and greater than 12 millionamino acid sites (at 25 gene occupancy) included exemplarsof all extant arachnid orders except for Schizomida (the sistergroup of Thelyphonida and part of Uropygi) Here we showthat phylogenetic analysis of the 3644-ortholog data set yieldsthe nonmonophyly of Arachnida with 100 bootstrap sup-port due to suspect placement of PseudoscorpionesAcariformes and Parasitiformes We demonstrate that thisresult is an LBA artifact by examining nodal support as afunction of concatenation by evolutionary rate Intriguinglya set of genes supporting the monophyly of Arachnidastrongly conflicts with other such sets which support otherhypotheses also consistent with morphology (eg the place-ment of Pseudoscorpiones) indicating nonuniformity of phy-logenetic signal Significant incongruence among gene treescorroborates the pervasiveness of phylogenetic conflict at thebase of Arachnida Nevertheless we demonstrate consistentsupport for the monophyly of Chelicerata Euchelicerata andseveral derived clades of arachnids

Results

Orthology Assignment and Supermatrix Assembly

We sequenced cDNA from 16 arachnid and one xiphosuranspecies using the Illumina GAII and HiSeq 2500 platforms andcombined these with published transcriptomes and genomesfrom 30 other species including some libraries sequenced inour laboratories (Riesgo et al 2012 Fernandez et al 2014)Assembly statistics for new libraries as well as published tran-scriptomes assembled from raw reads are provided in table 1Subsequent to open reading frame (ORF) prediction and se-lection of the longest ORFs per unigene 3055ndash18814 peptidesequences were retained for Illumina libraries and completegenomes 815ndash1127 for 454 Life Science platform librariesand 54ndash10646 for Sanger-sequenced EST librariesOrthology assessment of this 48-taxon data set with theOrthologous MAtrix (OMA) stand-alone algorithm recovered77774 orthogroups Subsequent to alignment and culling ofambiguously aligned positions a supermatrix was con-structed by retaining a minimum gene occupancy thresholdof 13 taxa (ie occupancy of 425 of the data set) resultingin 3644 concatenated orthologs (1235912 aligned sites3623 occupied) The number of orthologs per taxonvaried from 6 (the palpigrade Prokoenenia wheeleri basedon published Sanger data) to 3215 (the harvestmanSitalcina lobata) Due to the paucity of data for the palpigradeits phylogenetic placement was analyzed separately

Maximum-Likelihood Analyses

Phylogenetic analysis of the 3644-ortholog matrix under max-imum likelihood (ML) recovered a fully resolved tree topologywith 100 bootstrap support for the monophyly ofChelicerata and Euchelicerata (fig 2) Contrary to the

2965

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Table 1 Species Sequenced and Analyzed in This Study

Order Source Raw Reads ContigsESTsGenes Peptides Filtered Peptides

Onychophora

Epiperipatus sp GenBank (Sanger ESTs) mdash 1868 698 480

Peripatopsis sedgwicki GenBank (454) mdash 10476 5489 2552

Peripatopsis capensis De novo (Illumina HiSeq) 40774042 92516 17452 12846

Myriapoda

Archispirostreptus gigas Diplopoda GenBank (454) mdash 4008 1237 815

Alipes grandidieri Chilopoda GenBank (Illumina GAII) 32332034 147732 34505 18814

Scutigera coleoptrata Chilopoda GenBank (Sanger ESTs) mdash 2400 961 570

Pycnogonida

Anoplodactylus eroticus Pycnogonida GenBank (454) mdash 3744 2291 1127

Endeis spinosa Pycnogonida GenBank (454) mdash 4069 3302 2656

Xiphosura

Limulus polyphemus Xiphosura De novo (Illumina HiSeq) 65099444 110362 33041 17824

Carcinoscorpius rotundicauda Xiphosura GenBank (Sanger ESTs) mdash 575 202 199

Arachnida

Damon variegatus Amplypygi De novo (Illumina GAII) 64733221 64715 19686 11823

Acanthoscurria gomesiana Araneae GenBank (Sanger ESTs) mdash 6790 2203 1567

Liphistius malayanus Araneae De novo (Illumina HiSeq) 62897982 61775 16526 11221

Frontinella communis Araneae De novo (Illumina HiSeq) 66183160 155930 53936 18978

Leucauge venusta Araneae De novo (Illumina HiSeq) 49301974 189630 62318 17591

Neoscona arabesca Araneae De novo (Illumina HiSeq) 57103328 150856 43252 16594

Blomia tropicalis Acariformes GenBank (Sanger ESTs) mdash 1432 925 888

Glycyphagus domesticus Acariformes GenBank (Sanger ESTs) mdash 2589 1309 1303

Suidasia medadensis Acariformes GenBank (Sanger ESTs) mdash 3585 2323 1766

Tetranychus urticae Acariformes GenBank (whole genome) mdash 18313 18313 14276

Amblyomma americanum Parasitiformes GenBank (Sanger ESTs) mdash 6502 1266 971

Amblyomma variegatum Parasitiformes GenBank (Sanger ESTs) mdash 3992 2248 1419

Dermacentor andersoni Parasitiformes GenBank (Sanger ESTs) mdash 3994 614 452

Ixodes scapularis Parasitiformes GenBank (whole genome) mdash 20473 20473 15288

Rhipicephalus appendiculatus Parasitiformes GenBank (Sanger ESTs) mdash 19123 10778 6303

Rhipicephalus microplus Parasitiformes GenBank (Sanger ESTs) mdash 52902 20939 10646

Larifuga sp Opiliones De novo (Illumina HiSeq) 38354659 115648 23750 10510

Metabiantes sp Opiliones De novo (Illumina HiSeq) 72266554 85203 30754 12843

Metasiro americanus Opiliones GenBank (Illumina GAII) mdash 83792 30041 16556

Pachylicus acutus Opiliones De novo (Illumina HiSeq) 18681026 69803 23379 14202

Phalangium opilio Opiliones De novo (Illumina GAII) 16225145 131889 40234 15277

Vonones ornata Opiliones De novo (Illumina HiSeq) 66096788 114248 37076 19208

Leiobunum verrucosum Opiliones GenBank (Illumina HiSeq) mdash 42636 26407 13329

Protolophus singularis Opiliones GenBank (Illumina HiSeq) mdash 287150 63477 13987

Ortholasma coronadense Opiliones GenBank (Illumina HiSeq) mdash 45807 23846 10780

Trogulus martensi Opiliones GenBank (Illumina HiSeq) mdash 55077 23233 12765

Hesperonemastoma modestum Opiliones GenBank (Illumina HiSeq) mdash 55565 32409 8845

Sclerobunus nondimorphicus Opiliones GenBank (Illumina HiSeq) mdash 56727 26599 11518

Sitalcina lobata Opiliones GenBank (Illumina HiSeq) mdash 64104 26410 14828

Siro boyerae Opiliones GenBank (Illumina HiSeq) mdash 38618 18657 11387

Prokoenenia wheeleri Palpigradi GenBank (Sanger ESTs) mdash 56 54 54

Synsphyronus apimelus Pseudoscorpiones De novo (Illumina HiSeq) 65443655 103556 31567 17820

Pseudocellus pearsei Ricinulei De novo (Illumina HiSeq) 91403481 29077 7687 5922

Ricinoides atewa Ricinulei De novo (Illumina GAII) 52766585 97327 31331 14324

Centruroides vittatus Scorpiones De novo (Illumina HiSeq) 45691843 14215 3859 3055

Pandinus imperator Scorpiones GenBank (454) mdash 17253 2219 2209

Eremobates sp Solifugae De novo (Illumina GAII) 86794767 94687 25474 11765

Mastigoproctus giganteus Uropygi De novo (Illumina GAII) 25983006 116600 32827 17674

2966

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

traditional basal split within Euchelicerata between Xiphosuraand the terrestrial Arachnida the ML analysis supported theparaphyly of Arachnida with Xiphosura sister to RicinuleiThe clade Pseudoscorpiones + Acariformes was recoveredsister to the remaining Euchelicerata this clade in turnformed part of a grade with Acariformes with respect tothe remaining Euchelicerata and Xiphosura All orders repre-sented by multiple terminals were recovered as monophyleticwith 100 bootstrap support often in spite of marked dis-parities in gene representation The greatest such disparity isexemplified by the Illumina-sequenced library of Limulus poly-phemus which has over 2 orders of magnitude greater rep-resentation than the Sanger-sequenced EST library ofCarcinoscorpius rotundicauda (2431 vs 14 orthologs respec-tively) support for Xiphosura was nevertheless maximal Allnodes in this topology were fully supported excepting thetwo corresponding to sister relationships of two sets of con-generic ticks (Amblyomma and Rhipicephalus) Longerbranch lengths were observed for the orders AcariformesParasitiformes and Pseudoscorpiones with respect to theremaining taxa

Matrix Reduction and Concatenation by GeneOccupancy

To assess the impact of heterogeneous levels of missing datawe generated three supermatrices Two of these weregenerated using the MARE algorithm retaining either all 47terminals (516 orthologs 122436 sites 3999 occupiedcall handle ldquoMARE1rdquo) or only the 30 librariescomprising Illumina data sets and whole genomes (1237orthologs 373349 sites 6249 occupied call handleldquoMARE2rdquo) A third matrix was generated by removing fromour original 3644-ortholog alignment all terminals with smal-ler libraries that is the same number of taxa as MARE2 (3644orthologs 1235912 sites 5534 occupied call handle ldquoIL-GENrdquo)

In spite of limited topological differences none ofthese three matrices recovered the monophyly ofArachnida (fig 3 and supplementary fig S1 SupplementaryMaterial online) In the two 30-taxon matrices (MARE2and IL-GEN) L polyphemus was consistently recoveredas nested within a paraphyletic Arachnida with strongsupport (bootstrap resampling frequency [BS] 98)

FIG 2 Relationships of Arachnida inferred from ML analysis of 3644 orthologs Numbers on nodes indicate bootstrap resampling frequencies (asterisksindicate a value of 100) Uppercase text indicates terminals represented by complete genomes boldface lowercase text indicates terminals with noveltranscriptomic data from this study Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

2967

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

In all three topologies Parasitiformes and the cladePseudoscorpiones + Acariformes formed a grade sister tothe remaining Euchelicerata and together with Solifugae inIL-GEN (a topology congruent with the supermatrix topologyin fig 2)

To infer the impact of gene occupancy on phylogeneticsignal another six matrices were constructed wherein thethreshold of gene occupancy was serially increased by fivetaxa up to representation for a minimum of 35 out of 47taxa A summary of matrix composition and amount of miss-ing data for these six matrices is provided in table 2 Bootstrapresampling frequency for various topological hypotheses ofinterest showed no consistent effect of missing data (fig 4)

Support for Arachnida was equal to or near zero for all ma-trices except for one which enforced gene occupancy as rep-resentation in at least 30 taxa (table 2 fig 4) The ML treetopology based on this exceptional matrix (98 orthologs16594 sites 6820 occupancy call handle ldquoAL30rdquo) recoveredthe monophyly of Arachnida with moderate support(BS = 79 supplementary fig S2 Supplementary Materialonline)

Concatenation by Percent Pairwise Identity

To assess how nodal support for alternative hypotheses ofchelicerate relationships is affected by rate of molecular evo-lution a series of 15 matrices was assembled wherein

FIG 3 Left Cladogram of arachnid relationships inferred from ML analysis post-matrix reduction retaining only terminals with Illumina-sequencedtranscriptomes or complete genomes Right Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger-sequenced EST and 454 library data from the 3644-ortholog supermatrix Orders of interest are indicated with shading Numbers on nodes indicatebootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

Table 2 Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold

Bootstrap Resampling Frequency

Gene Occupancy Threshold ProportionMissingGaps

Number ofOrthologs (sites)

Chelicerata Euchelicerata Arachnida

At least 13 taxa 0638 3644 (1235912) 100 100 0

At least 15 taxa 0606 2806 (896210) 100 100 0

At least 20 taxa 0520 1152 (312646) 100 100 0

At least 25 taxa 0448 352 (77516) 100 100 0

At least 30 taxa 0318 98 (16594) 99 100 79

At least 35 taxa 0249 23 (4021) 72 100 1

2968

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 3: Phylogenomic Interrogation of Arachnida Reveals Systemic

appeared in the fossil record in the Ordovician and are sur-vived by just four extant species that diverged in theCretaceous (Obst et al 2012) The combination of ancientdiversification and subsequent extinctions has presumablyproduced short internodes in extant chelicerate phylogenya documented affliction for phylogenetic reconstruction(Rokas and Carroll 2006 Salichos and Rokas 2013)

Second difficulty of phylogenetic resolution is com-pounded by accelerated evolution in certain arachnidorders Specifically mitochondrial genes of PseudoscorpionesAcariformes and Parasitiformes (or at least Mesostigmata)have demonstrably higher rates of evolution and larger var-iability in nucleotide composition than other Euchelicerata(Arabi et al 2012) Similarly branch lengths inferred fromnuclear ribosomal gene alignments suggest heterotachy inthe same groups (as well as Palpigradi and some araneo-morph spiders) even upon accounting for secondarystructure (Pepato et al 2010) Significantly accelerated ratesof evolution in certain lineages underlie long-branch attrac-tion (LBA) the artifactual close relationship of lineages withlong-branch lengths (Felsenstein 1978 Huelsenbeck 1997Bergsten 2005 Philippe et al 2005) Because outgroup taxaare often undersampled the placement of long-branch taxasister to or part of a grade with outgroup terminals is symp-tomatic of LBA artifacts Application of sequence data atgenomic scales compounded by inclusion of relatively fewterminals can result in maximal support values for spuriousplacement of long-branch terminals (reviewed by Bergsten2005)

A third and final aspect of arachnid biology hindering phy-logenetic resolution may be variability of genome size At oneextreme the genome of the mite Tetranychus urticae isamong the smallest arthropod genomes comprising approx-imately 18224 coding genes (Grbic et al 2011) In contrastthe genome of the scorpion Mesobuthus martensii containsapproximately 32016 genes more than any other animalgenome (Cao et al 2013) For reference approximately15771 genes occur in the fruit fly Drosophila melanogaster(FlyBase Consortium Release 5) and approximately 21000ndash25000 in Homo sapiens (Genome Reference Consortium)Furthermore whereas genome compaction in mites is asso-ciated with gene loss (Grbic et al 2011) scorpions appear tohave undergone high gene family turnover as well as geneduplications and neofunctionalization of resulting paralogs(Cao et al 2013 Sharma et al 2014) Such marked historicaldisparity in the evolution of gene families across arachnidsmay have hindered accurate assessments of orthology in datasets composed of Sanger-sequenced nuclear genes particu-larly some commonly utilized phylogenetic markers assumedto be single-copy orthologs throughout arthropods (egHedin et al 2010) but subsequently shown to contain nu-merous paralogs using cloning techniques andor transcrip-tomic resources (Riesgo et al 2012 Clouse et al 2013)

Toward improving resolution within Arachnida we com-piled a data set of 48 taxa including new transcriptomiclibraries for 17 species published herein two complete ge-nomes (the mite T urticae and the tick Ixodes scapularis)and previously published expressed sequence tag (EST) and

Illumina platform libraries Sampled outgroup taxa toArachnida consisted of Xiphosura and Pycnogonida as wellas basally branching Mandibulata (myriapods) andOnychophora We used recently developed tools for orthol-ogy prediction and optimization of matrix composition Thelargest data set of 3644 genes and greater than 12 millionamino acid sites (at 25 gene occupancy) included exemplarsof all extant arachnid orders except for Schizomida (the sistergroup of Thelyphonida and part of Uropygi) Here we showthat phylogenetic analysis of the 3644-ortholog data set yieldsthe nonmonophyly of Arachnida with 100 bootstrap sup-port due to suspect placement of PseudoscorpionesAcariformes and Parasitiformes We demonstrate that thisresult is an LBA artifact by examining nodal support as afunction of concatenation by evolutionary rate Intriguinglya set of genes supporting the monophyly of Arachnidastrongly conflicts with other such sets which support otherhypotheses also consistent with morphology (eg the place-ment of Pseudoscorpiones) indicating nonuniformity of phy-logenetic signal Significant incongruence among gene treescorroborates the pervasiveness of phylogenetic conflict at thebase of Arachnida Nevertheless we demonstrate consistentsupport for the monophyly of Chelicerata Euchelicerata andseveral derived clades of arachnids

Results

Orthology Assignment and Supermatrix Assembly

We sequenced cDNA from 16 arachnid and one xiphosuranspecies using the Illumina GAII and HiSeq 2500 platforms andcombined these with published transcriptomes and genomesfrom 30 other species including some libraries sequenced inour laboratories (Riesgo et al 2012 Fernandez et al 2014)Assembly statistics for new libraries as well as published tran-scriptomes assembled from raw reads are provided in table 1Subsequent to open reading frame (ORF) prediction and se-lection of the longest ORFs per unigene 3055ndash18814 peptidesequences were retained for Illumina libraries and completegenomes 815ndash1127 for 454 Life Science platform librariesand 54ndash10646 for Sanger-sequenced EST librariesOrthology assessment of this 48-taxon data set with theOrthologous MAtrix (OMA) stand-alone algorithm recovered77774 orthogroups Subsequent to alignment and culling ofambiguously aligned positions a supermatrix was con-structed by retaining a minimum gene occupancy thresholdof 13 taxa (ie occupancy of 425 of the data set) resultingin 3644 concatenated orthologs (1235912 aligned sites3623 occupied) The number of orthologs per taxonvaried from 6 (the palpigrade Prokoenenia wheeleri basedon published Sanger data) to 3215 (the harvestmanSitalcina lobata) Due to the paucity of data for the palpigradeits phylogenetic placement was analyzed separately

Maximum-Likelihood Analyses

Phylogenetic analysis of the 3644-ortholog matrix under max-imum likelihood (ML) recovered a fully resolved tree topologywith 100 bootstrap support for the monophyly ofChelicerata and Euchelicerata (fig 2) Contrary to the

2965

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Table 1 Species Sequenced and Analyzed in This Study

Order Source Raw Reads ContigsESTsGenes Peptides Filtered Peptides

Onychophora

Epiperipatus sp GenBank (Sanger ESTs) mdash 1868 698 480

Peripatopsis sedgwicki GenBank (454) mdash 10476 5489 2552

Peripatopsis capensis De novo (Illumina HiSeq) 40774042 92516 17452 12846

Myriapoda

Archispirostreptus gigas Diplopoda GenBank (454) mdash 4008 1237 815

Alipes grandidieri Chilopoda GenBank (Illumina GAII) 32332034 147732 34505 18814

Scutigera coleoptrata Chilopoda GenBank (Sanger ESTs) mdash 2400 961 570

Pycnogonida

Anoplodactylus eroticus Pycnogonida GenBank (454) mdash 3744 2291 1127

Endeis spinosa Pycnogonida GenBank (454) mdash 4069 3302 2656

Xiphosura

Limulus polyphemus Xiphosura De novo (Illumina HiSeq) 65099444 110362 33041 17824

Carcinoscorpius rotundicauda Xiphosura GenBank (Sanger ESTs) mdash 575 202 199

Arachnida

Damon variegatus Amplypygi De novo (Illumina GAII) 64733221 64715 19686 11823

Acanthoscurria gomesiana Araneae GenBank (Sanger ESTs) mdash 6790 2203 1567

Liphistius malayanus Araneae De novo (Illumina HiSeq) 62897982 61775 16526 11221

Frontinella communis Araneae De novo (Illumina HiSeq) 66183160 155930 53936 18978

Leucauge venusta Araneae De novo (Illumina HiSeq) 49301974 189630 62318 17591

Neoscona arabesca Araneae De novo (Illumina HiSeq) 57103328 150856 43252 16594

Blomia tropicalis Acariformes GenBank (Sanger ESTs) mdash 1432 925 888

Glycyphagus domesticus Acariformes GenBank (Sanger ESTs) mdash 2589 1309 1303

Suidasia medadensis Acariformes GenBank (Sanger ESTs) mdash 3585 2323 1766

Tetranychus urticae Acariformes GenBank (whole genome) mdash 18313 18313 14276

Amblyomma americanum Parasitiformes GenBank (Sanger ESTs) mdash 6502 1266 971

Amblyomma variegatum Parasitiformes GenBank (Sanger ESTs) mdash 3992 2248 1419

Dermacentor andersoni Parasitiformes GenBank (Sanger ESTs) mdash 3994 614 452

Ixodes scapularis Parasitiformes GenBank (whole genome) mdash 20473 20473 15288

Rhipicephalus appendiculatus Parasitiformes GenBank (Sanger ESTs) mdash 19123 10778 6303

Rhipicephalus microplus Parasitiformes GenBank (Sanger ESTs) mdash 52902 20939 10646

Larifuga sp Opiliones De novo (Illumina HiSeq) 38354659 115648 23750 10510

Metabiantes sp Opiliones De novo (Illumina HiSeq) 72266554 85203 30754 12843

Metasiro americanus Opiliones GenBank (Illumina GAII) mdash 83792 30041 16556

Pachylicus acutus Opiliones De novo (Illumina HiSeq) 18681026 69803 23379 14202

Phalangium opilio Opiliones De novo (Illumina GAII) 16225145 131889 40234 15277

Vonones ornata Opiliones De novo (Illumina HiSeq) 66096788 114248 37076 19208

Leiobunum verrucosum Opiliones GenBank (Illumina HiSeq) mdash 42636 26407 13329

Protolophus singularis Opiliones GenBank (Illumina HiSeq) mdash 287150 63477 13987

Ortholasma coronadense Opiliones GenBank (Illumina HiSeq) mdash 45807 23846 10780

Trogulus martensi Opiliones GenBank (Illumina HiSeq) mdash 55077 23233 12765

Hesperonemastoma modestum Opiliones GenBank (Illumina HiSeq) mdash 55565 32409 8845

Sclerobunus nondimorphicus Opiliones GenBank (Illumina HiSeq) mdash 56727 26599 11518

Sitalcina lobata Opiliones GenBank (Illumina HiSeq) mdash 64104 26410 14828

Siro boyerae Opiliones GenBank (Illumina HiSeq) mdash 38618 18657 11387

Prokoenenia wheeleri Palpigradi GenBank (Sanger ESTs) mdash 56 54 54

Synsphyronus apimelus Pseudoscorpiones De novo (Illumina HiSeq) 65443655 103556 31567 17820

Pseudocellus pearsei Ricinulei De novo (Illumina HiSeq) 91403481 29077 7687 5922

Ricinoides atewa Ricinulei De novo (Illumina GAII) 52766585 97327 31331 14324

Centruroides vittatus Scorpiones De novo (Illumina HiSeq) 45691843 14215 3859 3055

Pandinus imperator Scorpiones GenBank (454) mdash 17253 2219 2209

Eremobates sp Solifugae De novo (Illumina GAII) 86794767 94687 25474 11765

Mastigoproctus giganteus Uropygi De novo (Illumina GAII) 25983006 116600 32827 17674

2966

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

traditional basal split within Euchelicerata between Xiphosuraand the terrestrial Arachnida the ML analysis supported theparaphyly of Arachnida with Xiphosura sister to RicinuleiThe clade Pseudoscorpiones + Acariformes was recoveredsister to the remaining Euchelicerata this clade in turnformed part of a grade with Acariformes with respect tothe remaining Euchelicerata and Xiphosura All orders repre-sented by multiple terminals were recovered as monophyleticwith 100 bootstrap support often in spite of marked dis-parities in gene representation The greatest such disparity isexemplified by the Illumina-sequenced library of Limulus poly-phemus which has over 2 orders of magnitude greater rep-resentation than the Sanger-sequenced EST library ofCarcinoscorpius rotundicauda (2431 vs 14 orthologs respec-tively) support for Xiphosura was nevertheless maximal Allnodes in this topology were fully supported excepting thetwo corresponding to sister relationships of two sets of con-generic ticks (Amblyomma and Rhipicephalus) Longerbranch lengths were observed for the orders AcariformesParasitiformes and Pseudoscorpiones with respect to theremaining taxa

Matrix Reduction and Concatenation by GeneOccupancy

To assess the impact of heterogeneous levels of missing datawe generated three supermatrices Two of these weregenerated using the MARE algorithm retaining either all 47terminals (516 orthologs 122436 sites 3999 occupiedcall handle ldquoMARE1rdquo) or only the 30 librariescomprising Illumina data sets and whole genomes (1237orthologs 373349 sites 6249 occupied call handleldquoMARE2rdquo) A third matrix was generated by removing fromour original 3644-ortholog alignment all terminals with smal-ler libraries that is the same number of taxa as MARE2 (3644orthologs 1235912 sites 5534 occupied call handle ldquoIL-GENrdquo)

In spite of limited topological differences none ofthese three matrices recovered the monophyly ofArachnida (fig 3 and supplementary fig S1 SupplementaryMaterial online) In the two 30-taxon matrices (MARE2and IL-GEN) L polyphemus was consistently recoveredas nested within a paraphyletic Arachnida with strongsupport (bootstrap resampling frequency [BS] 98)

FIG 2 Relationships of Arachnida inferred from ML analysis of 3644 orthologs Numbers on nodes indicate bootstrap resampling frequencies (asterisksindicate a value of 100) Uppercase text indicates terminals represented by complete genomes boldface lowercase text indicates terminals with noveltranscriptomic data from this study Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

2967

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

In all three topologies Parasitiformes and the cladePseudoscorpiones + Acariformes formed a grade sister tothe remaining Euchelicerata and together with Solifugae inIL-GEN (a topology congruent with the supermatrix topologyin fig 2)

To infer the impact of gene occupancy on phylogeneticsignal another six matrices were constructed wherein thethreshold of gene occupancy was serially increased by fivetaxa up to representation for a minimum of 35 out of 47taxa A summary of matrix composition and amount of miss-ing data for these six matrices is provided in table 2 Bootstrapresampling frequency for various topological hypotheses ofinterest showed no consistent effect of missing data (fig 4)

Support for Arachnida was equal to or near zero for all ma-trices except for one which enforced gene occupancy as rep-resentation in at least 30 taxa (table 2 fig 4) The ML treetopology based on this exceptional matrix (98 orthologs16594 sites 6820 occupancy call handle ldquoAL30rdquo) recoveredthe monophyly of Arachnida with moderate support(BS = 79 supplementary fig S2 Supplementary Materialonline)

Concatenation by Percent Pairwise Identity

To assess how nodal support for alternative hypotheses ofchelicerate relationships is affected by rate of molecular evo-lution a series of 15 matrices was assembled wherein

FIG 3 Left Cladogram of arachnid relationships inferred from ML analysis post-matrix reduction retaining only terminals with Illumina-sequencedtranscriptomes or complete genomes Right Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger-sequenced EST and 454 library data from the 3644-ortholog supermatrix Orders of interest are indicated with shading Numbers on nodes indicatebootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

Table 2 Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold

Bootstrap Resampling Frequency

Gene Occupancy Threshold ProportionMissingGaps

Number ofOrthologs (sites)

Chelicerata Euchelicerata Arachnida

At least 13 taxa 0638 3644 (1235912) 100 100 0

At least 15 taxa 0606 2806 (896210) 100 100 0

At least 20 taxa 0520 1152 (312646) 100 100 0

At least 25 taxa 0448 352 (77516) 100 100 0

At least 30 taxa 0318 98 (16594) 99 100 79

At least 35 taxa 0249 23 (4021) 72 100 1

2968

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 4: Phylogenomic Interrogation of Arachnida Reveals Systemic

Table 1 Species Sequenced and Analyzed in This Study

Order Source Raw Reads ContigsESTsGenes Peptides Filtered Peptides

Onychophora

Epiperipatus sp GenBank (Sanger ESTs) mdash 1868 698 480

Peripatopsis sedgwicki GenBank (454) mdash 10476 5489 2552

Peripatopsis capensis De novo (Illumina HiSeq) 40774042 92516 17452 12846

Myriapoda

Archispirostreptus gigas Diplopoda GenBank (454) mdash 4008 1237 815

Alipes grandidieri Chilopoda GenBank (Illumina GAII) 32332034 147732 34505 18814

Scutigera coleoptrata Chilopoda GenBank (Sanger ESTs) mdash 2400 961 570

Pycnogonida

Anoplodactylus eroticus Pycnogonida GenBank (454) mdash 3744 2291 1127

Endeis spinosa Pycnogonida GenBank (454) mdash 4069 3302 2656

Xiphosura

Limulus polyphemus Xiphosura De novo (Illumina HiSeq) 65099444 110362 33041 17824

Carcinoscorpius rotundicauda Xiphosura GenBank (Sanger ESTs) mdash 575 202 199

Arachnida

Damon variegatus Amplypygi De novo (Illumina GAII) 64733221 64715 19686 11823

Acanthoscurria gomesiana Araneae GenBank (Sanger ESTs) mdash 6790 2203 1567

Liphistius malayanus Araneae De novo (Illumina HiSeq) 62897982 61775 16526 11221

Frontinella communis Araneae De novo (Illumina HiSeq) 66183160 155930 53936 18978

Leucauge venusta Araneae De novo (Illumina HiSeq) 49301974 189630 62318 17591

Neoscona arabesca Araneae De novo (Illumina HiSeq) 57103328 150856 43252 16594

Blomia tropicalis Acariformes GenBank (Sanger ESTs) mdash 1432 925 888

Glycyphagus domesticus Acariformes GenBank (Sanger ESTs) mdash 2589 1309 1303

Suidasia medadensis Acariformes GenBank (Sanger ESTs) mdash 3585 2323 1766

Tetranychus urticae Acariformes GenBank (whole genome) mdash 18313 18313 14276

Amblyomma americanum Parasitiformes GenBank (Sanger ESTs) mdash 6502 1266 971

Amblyomma variegatum Parasitiformes GenBank (Sanger ESTs) mdash 3992 2248 1419

Dermacentor andersoni Parasitiformes GenBank (Sanger ESTs) mdash 3994 614 452

Ixodes scapularis Parasitiformes GenBank (whole genome) mdash 20473 20473 15288

Rhipicephalus appendiculatus Parasitiformes GenBank (Sanger ESTs) mdash 19123 10778 6303

Rhipicephalus microplus Parasitiformes GenBank (Sanger ESTs) mdash 52902 20939 10646

Larifuga sp Opiliones De novo (Illumina HiSeq) 38354659 115648 23750 10510

Metabiantes sp Opiliones De novo (Illumina HiSeq) 72266554 85203 30754 12843

Metasiro americanus Opiliones GenBank (Illumina GAII) mdash 83792 30041 16556

Pachylicus acutus Opiliones De novo (Illumina HiSeq) 18681026 69803 23379 14202

Phalangium opilio Opiliones De novo (Illumina GAII) 16225145 131889 40234 15277

Vonones ornata Opiliones De novo (Illumina HiSeq) 66096788 114248 37076 19208

Leiobunum verrucosum Opiliones GenBank (Illumina HiSeq) mdash 42636 26407 13329

Protolophus singularis Opiliones GenBank (Illumina HiSeq) mdash 287150 63477 13987

Ortholasma coronadense Opiliones GenBank (Illumina HiSeq) mdash 45807 23846 10780

Trogulus martensi Opiliones GenBank (Illumina HiSeq) mdash 55077 23233 12765

Hesperonemastoma modestum Opiliones GenBank (Illumina HiSeq) mdash 55565 32409 8845

Sclerobunus nondimorphicus Opiliones GenBank (Illumina HiSeq) mdash 56727 26599 11518

Sitalcina lobata Opiliones GenBank (Illumina HiSeq) mdash 64104 26410 14828

Siro boyerae Opiliones GenBank (Illumina HiSeq) mdash 38618 18657 11387

Prokoenenia wheeleri Palpigradi GenBank (Sanger ESTs) mdash 56 54 54

Synsphyronus apimelus Pseudoscorpiones De novo (Illumina HiSeq) 65443655 103556 31567 17820

Pseudocellus pearsei Ricinulei De novo (Illumina HiSeq) 91403481 29077 7687 5922

Ricinoides atewa Ricinulei De novo (Illumina GAII) 52766585 97327 31331 14324

Centruroides vittatus Scorpiones De novo (Illumina HiSeq) 45691843 14215 3859 3055

Pandinus imperator Scorpiones GenBank (454) mdash 17253 2219 2209

Eremobates sp Solifugae De novo (Illumina GAII) 86794767 94687 25474 11765

Mastigoproctus giganteus Uropygi De novo (Illumina GAII) 25983006 116600 32827 17674

2966

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

traditional basal split within Euchelicerata between Xiphosuraand the terrestrial Arachnida the ML analysis supported theparaphyly of Arachnida with Xiphosura sister to RicinuleiThe clade Pseudoscorpiones + Acariformes was recoveredsister to the remaining Euchelicerata this clade in turnformed part of a grade with Acariformes with respect tothe remaining Euchelicerata and Xiphosura All orders repre-sented by multiple terminals were recovered as monophyleticwith 100 bootstrap support often in spite of marked dis-parities in gene representation The greatest such disparity isexemplified by the Illumina-sequenced library of Limulus poly-phemus which has over 2 orders of magnitude greater rep-resentation than the Sanger-sequenced EST library ofCarcinoscorpius rotundicauda (2431 vs 14 orthologs respec-tively) support for Xiphosura was nevertheless maximal Allnodes in this topology were fully supported excepting thetwo corresponding to sister relationships of two sets of con-generic ticks (Amblyomma and Rhipicephalus) Longerbranch lengths were observed for the orders AcariformesParasitiformes and Pseudoscorpiones with respect to theremaining taxa

Matrix Reduction and Concatenation by GeneOccupancy

To assess the impact of heterogeneous levels of missing datawe generated three supermatrices Two of these weregenerated using the MARE algorithm retaining either all 47terminals (516 orthologs 122436 sites 3999 occupiedcall handle ldquoMARE1rdquo) or only the 30 librariescomprising Illumina data sets and whole genomes (1237orthologs 373349 sites 6249 occupied call handleldquoMARE2rdquo) A third matrix was generated by removing fromour original 3644-ortholog alignment all terminals with smal-ler libraries that is the same number of taxa as MARE2 (3644orthologs 1235912 sites 5534 occupied call handle ldquoIL-GENrdquo)

In spite of limited topological differences none ofthese three matrices recovered the monophyly ofArachnida (fig 3 and supplementary fig S1 SupplementaryMaterial online) In the two 30-taxon matrices (MARE2and IL-GEN) L polyphemus was consistently recoveredas nested within a paraphyletic Arachnida with strongsupport (bootstrap resampling frequency [BS] 98)

FIG 2 Relationships of Arachnida inferred from ML analysis of 3644 orthologs Numbers on nodes indicate bootstrap resampling frequencies (asterisksindicate a value of 100) Uppercase text indicates terminals represented by complete genomes boldface lowercase text indicates terminals with noveltranscriptomic data from this study Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

2967

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

In all three topologies Parasitiformes and the cladePseudoscorpiones + Acariformes formed a grade sister tothe remaining Euchelicerata and together with Solifugae inIL-GEN (a topology congruent with the supermatrix topologyin fig 2)

To infer the impact of gene occupancy on phylogeneticsignal another six matrices were constructed wherein thethreshold of gene occupancy was serially increased by fivetaxa up to representation for a minimum of 35 out of 47taxa A summary of matrix composition and amount of miss-ing data for these six matrices is provided in table 2 Bootstrapresampling frequency for various topological hypotheses ofinterest showed no consistent effect of missing data (fig 4)

Support for Arachnida was equal to or near zero for all ma-trices except for one which enforced gene occupancy as rep-resentation in at least 30 taxa (table 2 fig 4) The ML treetopology based on this exceptional matrix (98 orthologs16594 sites 6820 occupancy call handle ldquoAL30rdquo) recoveredthe monophyly of Arachnida with moderate support(BS = 79 supplementary fig S2 Supplementary Materialonline)

Concatenation by Percent Pairwise Identity

To assess how nodal support for alternative hypotheses ofchelicerate relationships is affected by rate of molecular evo-lution a series of 15 matrices was assembled wherein

FIG 3 Left Cladogram of arachnid relationships inferred from ML analysis post-matrix reduction retaining only terminals with Illumina-sequencedtranscriptomes or complete genomes Right Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger-sequenced EST and 454 library data from the 3644-ortholog supermatrix Orders of interest are indicated with shading Numbers on nodes indicatebootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

Table 2 Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold

Bootstrap Resampling Frequency

Gene Occupancy Threshold ProportionMissingGaps

Number ofOrthologs (sites)

Chelicerata Euchelicerata Arachnida

At least 13 taxa 0638 3644 (1235912) 100 100 0

At least 15 taxa 0606 2806 (896210) 100 100 0

At least 20 taxa 0520 1152 (312646) 100 100 0

At least 25 taxa 0448 352 (77516) 100 100 0

At least 30 taxa 0318 98 (16594) 99 100 79

At least 35 taxa 0249 23 (4021) 72 100 1

2968

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 5: Phylogenomic Interrogation of Arachnida Reveals Systemic

traditional basal split within Euchelicerata between Xiphosuraand the terrestrial Arachnida the ML analysis supported theparaphyly of Arachnida with Xiphosura sister to RicinuleiThe clade Pseudoscorpiones + Acariformes was recoveredsister to the remaining Euchelicerata this clade in turnformed part of a grade with Acariformes with respect tothe remaining Euchelicerata and Xiphosura All orders repre-sented by multiple terminals were recovered as monophyleticwith 100 bootstrap support often in spite of marked dis-parities in gene representation The greatest such disparity isexemplified by the Illumina-sequenced library of Limulus poly-phemus which has over 2 orders of magnitude greater rep-resentation than the Sanger-sequenced EST library ofCarcinoscorpius rotundicauda (2431 vs 14 orthologs respec-tively) support for Xiphosura was nevertheless maximal Allnodes in this topology were fully supported excepting thetwo corresponding to sister relationships of two sets of con-generic ticks (Amblyomma and Rhipicephalus) Longerbranch lengths were observed for the orders AcariformesParasitiformes and Pseudoscorpiones with respect to theremaining taxa

Matrix Reduction and Concatenation by GeneOccupancy

To assess the impact of heterogeneous levels of missing datawe generated three supermatrices Two of these weregenerated using the MARE algorithm retaining either all 47terminals (516 orthologs 122436 sites 3999 occupiedcall handle ldquoMARE1rdquo) or only the 30 librariescomprising Illumina data sets and whole genomes (1237orthologs 373349 sites 6249 occupied call handleldquoMARE2rdquo) A third matrix was generated by removing fromour original 3644-ortholog alignment all terminals with smal-ler libraries that is the same number of taxa as MARE2 (3644orthologs 1235912 sites 5534 occupied call handle ldquoIL-GENrdquo)

In spite of limited topological differences none ofthese three matrices recovered the monophyly ofArachnida (fig 3 and supplementary fig S1 SupplementaryMaterial online) In the two 30-taxon matrices (MARE2and IL-GEN) L polyphemus was consistently recoveredas nested within a paraphyletic Arachnida with strongsupport (bootstrap resampling frequency [BS] 98)

FIG 2 Relationships of Arachnida inferred from ML analysis of 3644 orthologs Numbers on nodes indicate bootstrap resampling frequencies (asterisksindicate a value of 100) Uppercase text indicates terminals represented by complete genomes boldface lowercase text indicates terminals with noveltranscriptomic data from this study Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

2967

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

In all three topologies Parasitiformes and the cladePseudoscorpiones + Acariformes formed a grade sister tothe remaining Euchelicerata and together with Solifugae inIL-GEN (a topology congruent with the supermatrix topologyin fig 2)

To infer the impact of gene occupancy on phylogeneticsignal another six matrices were constructed wherein thethreshold of gene occupancy was serially increased by fivetaxa up to representation for a minimum of 35 out of 47taxa A summary of matrix composition and amount of miss-ing data for these six matrices is provided in table 2 Bootstrapresampling frequency for various topological hypotheses ofinterest showed no consistent effect of missing data (fig 4)

Support for Arachnida was equal to or near zero for all ma-trices except for one which enforced gene occupancy as rep-resentation in at least 30 taxa (table 2 fig 4) The ML treetopology based on this exceptional matrix (98 orthologs16594 sites 6820 occupancy call handle ldquoAL30rdquo) recoveredthe monophyly of Arachnida with moderate support(BS = 79 supplementary fig S2 Supplementary Materialonline)

Concatenation by Percent Pairwise Identity

To assess how nodal support for alternative hypotheses ofchelicerate relationships is affected by rate of molecular evo-lution a series of 15 matrices was assembled wherein

FIG 3 Left Cladogram of arachnid relationships inferred from ML analysis post-matrix reduction retaining only terminals with Illumina-sequencedtranscriptomes or complete genomes Right Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger-sequenced EST and 454 library data from the 3644-ortholog supermatrix Orders of interest are indicated with shading Numbers on nodes indicatebootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

Table 2 Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold

Bootstrap Resampling Frequency

Gene Occupancy Threshold ProportionMissingGaps

Number ofOrthologs (sites)

Chelicerata Euchelicerata Arachnida

At least 13 taxa 0638 3644 (1235912) 100 100 0

At least 15 taxa 0606 2806 (896210) 100 100 0

At least 20 taxa 0520 1152 (312646) 100 100 0

At least 25 taxa 0448 352 (77516) 100 100 0

At least 30 taxa 0318 98 (16594) 99 100 79

At least 35 taxa 0249 23 (4021) 72 100 1

2968

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 6: Phylogenomic Interrogation of Arachnida Reveals Systemic

In all three topologies Parasitiformes and the cladePseudoscorpiones + Acariformes formed a grade sister tothe remaining Euchelicerata and together with Solifugae inIL-GEN (a topology congruent with the supermatrix topologyin fig 2)

To infer the impact of gene occupancy on phylogeneticsignal another six matrices were constructed wherein thethreshold of gene occupancy was serially increased by fivetaxa up to representation for a minimum of 35 out of 47taxa A summary of matrix composition and amount of miss-ing data for these six matrices is provided in table 2 Bootstrapresampling frequency for various topological hypotheses ofinterest showed no consistent effect of missing data (fig 4)

Support for Arachnida was equal to or near zero for all ma-trices except for one which enforced gene occupancy as rep-resentation in at least 30 taxa (table 2 fig 4) The ML treetopology based on this exceptional matrix (98 orthologs16594 sites 6820 occupancy call handle ldquoAL30rdquo) recoveredthe monophyly of Arachnida with moderate support(BS = 79 supplementary fig S2 Supplementary Materialonline)

Concatenation by Percent Pairwise Identity

To assess how nodal support for alternative hypotheses ofchelicerate relationships is affected by rate of molecular evo-lution a series of 15 matrices was assembled wherein

FIG 3 Left Cladogram of arachnid relationships inferred from ML analysis post-matrix reduction retaining only terminals with Illumina-sequencedtranscriptomes or complete genomes Right Cladogram of chelicerate relationships inferred from ML analysis upon manual exclusion of Sanger-sequenced EST and 454 library data from the 3644-ortholog supermatrix Orders of interest are indicated with shading Numbers on nodes indicatebootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global view of the sequence alignment

Table 2 Nodal Support for Three Ingroup Clades as a Function of Gene Occupancy Threshold

Bootstrap Resampling Frequency

Gene Occupancy Threshold ProportionMissingGaps

Number ofOrthologs (sites)

Chelicerata Euchelicerata Arachnida

At least 13 taxa 0638 3644 (1235912) 100 100 0

At least 15 taxa 0606 2806 (896210) 100 100 0

At least 20 taxa 0520 1152 (312646) 100 100 0

At least 25 taxa 0448 352 (77516) 100 100 0

At least 30 taxa 0318 98 (16594) 99 100 79

At least 35 taxa 0249 23 (4021) 72 100 1

2968

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 7: Phylogenomic Interrogation of Arachnida Reveals Systemic

orthologs were concatenated in order of percent pairwiseidentity beginning with the most conserved (ie similar)We added genes at increments of 100 until 1000 genes hadbeen concatenated and increments of 500 subsequentlyuntil 3500 genes had been concatenated We scoredthe bootstrap resampling frequency for various topologicalhypotheses For such nodes as Chelicerata and Eucheliceratawe observed a trajectory of nodal support theoreticallyexpected with accruing data monotonic or nearlymonotonic increase in bootstrap resampling frequency untilfixation at the maximum value (fig 5) A similar result wasobtained for some derived relationships that are consistentwith morphology such as Pedipalpi (Amblypygi + Uropygi)Tetrapulmonata (Araneae + Pedipalpi) and a clade weterm Arachnopulmonata new name which comprisesScorpiones + Tetrapulmonata Albeit recovered with moder-ate support in the majority of their analyses Regier et al(2010) used the term ldquoPulmonatardquo for this clade aname introduced by Firstman (1973) for a clade of pulmonatearachnids but which had already been in use for decades

for a group of gastropods To facilitate systematic discoursemore broadly we rename this group Arachnopul-monata

In contrast the trajectory corresponding to Arachnidashowed initial increase to maximum bootstrap frequencyupon concatenation of the 500 slowest-evolving orthologsThereafter nodal support for Arachnida decreased to zero bythe concatenation of the 1000th slowest-evolving orthologAs illustrated in figure 6 the topology inferred from the 500slowest-evolving orthologs recovers three separate cladeswithin Arachnida Arachnopulmonata a clade comprised ofOpiliones Solifugae and Ricinulei (a trio of orders with acompletely segmented opisthosoma and a tracheal tubulesystem for respiration) and a clade comprised of Pseudoscor-piones Acariformes and Parasitiformes

Like Arachnida a peak of support is observed for the place-ment of Pseudoscorpiones as part of a clade withArachnopulmonata Specifically much of that nodal supportplaces Pseudoscorpiones as sister group to Scorpiones (figs 5and 6) Unlike Arachnida nodal support for these hypothesesis more restricted to the 200 slowest-evolving orthologs

FIG 4 (A) Bootstrap resampling frequency for phylogenetic hypotheses inferred from concatenation of orthologs by gene occupancy (B) Global view ofsequence alignment of largest matrix (13 taxa) (C) Global view of sequence alignment of densest matrix (35 taxa)

2969

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 8: Phylogenomic Interrogation of Arachnida Reveals Systemic

which are insufficient to support the monophyly ofArachnida (fig 6)

To ensure that results from this method of concatenationwere not driven by missing data we ranked the 3644 ortho-logs in order of percent pairwise identity and mapped theoldest most recent common ancestor (MRCA) recoverablefor each orthogroup Dense representation was observed forthe MRCAs of Arachnida Euchelicerata and Arthropoda andthe root of the tree (supplementary fig S3 SupplementaryMaterial online) Considering the composition of three ma-trices of significance (200-slowest evolving orthologs 500-slowest evolving orthologs complete matrix) the proportionof orthologs wherein at least the MRCA of Arachnida wasrepresented exceeded 98 for all matrices Only 72 (198) oforthologs in the complete matrix were restricted to samplingof derived arachnids groups and these were evenly distrib-uted with respect to ordination of evolutionary rate Theseresults indicate that missing data affected the sampling ofbasal nodes in a nonsignificant and random way

Bayesian Inference Analyses

Runs of PhyloBayes implementing infinite mixture models(GTR + CAT) were implemented for the 500 slowest-evolving

ortholog data set and for MARE1 did not achieve conver-gence (03ltmaximum discrepancylt 1) after 7 months ofcomputation on 500 parallelized processors We examinedposterior distributions under the four independent runs forboth data sets and observed that all chains had recovered thesame consensus topology for each data set Under either dataset internodes of infinitesimal length separated major cladeswithin Euchelicerata the result is indistinguishable from abasal polytomy for both data sets (fig 7) Bayesian inferenceanalysis of the 500 slowest-evolving ortholog data set recov-ered the clade (Pseudoscorpiones + Scorpiones) but notarachnid monophyly (fig 7 compare with fig 6) The twoBayesian inference topologies were almost identical due toextensive overlap of orthologs between the two data sets ofthe 516 genes in MARE1 301 are shared with the 500 slowest-evolving ortholog set

Discussion

Conflict Resolution

The phylogenetic data set amassed in this study is the largestdeployed to test basal relationships of Arachnida to dateNevertheless the tree topology recovered by ML analysis ofthe 3644 supermatrix (fig 2) recapitulates a recurring

FIG 5 Bootstrap resampling frequency for phylogenetic hypotheses inferred from sequential concatenation of orthologs in order of percent pairwiseidentity (most to least conserved)

2970

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 9: Phylogenomic Interrogation of Arachnida Reveals Systemic

phenomenon in molecular phylogenetic studies ofChelicerata The putative nonmonophyly of Arachnida sup-ported by this analysis is a counterintuitive result overturninga key relationship supported by morphological cladistic stud-ies (eg Wheeler and Hayashi 1998 Giribet et al 2002 Shultz2007) Specifically the nested placement of Xiphosura hasbeen recovered in other phylogenomic assessments albeitwith incomplete sampling of arachnid orders (Roeding et al2009 Meusemann et al 2010) Shultz (2007) observed thatarachnids are putatively united by a series of synapomorphiessuch as aerial respiration an anteriorlyanteroventrally di-rected mouth and the loss of some characters observed inmarine chelicerates (eg walking leg gnathobases cardiaclobe) Counterarguments to morphological support for arach-nid monophyly which are grounded in convergence of char-acters driven by independent episodes of terrestrialization aredifficult to falsify due to the fragmentary nature of the fossilrecord and limitations in parameterizing the historical prob-ability of terrestrialization events (Shultz 2007 von Reumontet al 2012)

However we regarded with suspicion the placementof three orders (Pseudoscorpiones Acariformes and

Parasitiformes) with known higher rates of molecular evolu-tion as a grade at the base of Euchelicerata with maximumbootstrap support whether based on transcriptomic data(Roeding et al 2009) nuclear ribosomal genes (Pepato et al2010) or mitochondrial genes (Arabi et al 2012) High or evenoptimal nodal support in genomic-scale analyses can belieinaccuracy of the tree topology interpartition conflicts andother biases (Felsenstein 1978 Rokas et al 2003 Salichos andRokas 2013) and may be symptomatic of an LBA phenome-non (reviewed by Bergsten 2005) Furthermore the inclusionof smaller (454 and Sanger-sequenced EST) libraries in ourdata set engenders the possibility that the nonmonophyly ofArachnida is an artifact stemming from missing data

The effect of missing data in phylogenetic analysis is con-troversial Simulations have shown that highly incompleteterminals can be reliably placed in phylogenies and oftenconfer advantages over excluding such terminals but mayengender LBA artifacts in some cases (Wiens 2006) Otheranalyses of empirical data sets have implicated missing dataas misleading specifically with respect to inflating nodal sup-port values for problematic nodes but data sets selected tobalance representation of sequence data have not been

FIG 6 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes Right Tree topology inferred from ML analysis of the 500slowest-evolving genes Orders of interest are indicated with shading Numbers on nodes indicate bootstrap resampling frequencies Bars on rightrepresent number of genes available for each terminal Below Global views of the sequence alignments

2971

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 10: Phylogenomic Interrogation of Arachnida Reveals Systemic

shown to resolve ambiguous nodes either (DellrsquoAmpio et al2014 Fernandez et al 2014)

To address this possibility we combined concatenationapproaches with ML inference of gene-tree topologies Inthe 3644-ortholog data set all arachnid orders were repre-sented by at least one Illumina data set or complete genomeand we thus calculated large numbers of potentially informa-tive gene trees (ie trees containing at least one member ofeach descendant branch and two distinct outgroups) for allnodes corresponding to interordinal relationships in therange of 701ndash3116 for nodes corresponding to basal arachnidrelationships (fig 8) However the proportion of those po-tentially informative gene trees that were congruent with agiven node varied markedly Generally internodes corre-sponding to the monophyly of derived clades (eg orderssampled with multiple terminals Tetrapulmonata) werecharacterized by a higher proportion of congruent genetrees in contrast to basal nodes within Euchelicerata Thesedata suggest that the gene tree incongruence observed is notentirely attributable to missing data

We conducted additional analyses to investigate whethernonmonophyly of Arachnida was attributable to missingdata Neither manual exclusion of 454 and Sanger-sequencedlibraries nor matrix reduction with MARE yielded arachnidmonophyly for their respective 30-terminal data sets (fig 3)

although the measures of matrix completeness for these re-duced data sets are comparable to phylogenomic counter-parts among pancrustaceans and myriapods (egMeusemann et al 2010 von Reumont et al 2012Fernandez et al 2014) Matrix reduction with MARE whileretaining all 47 terminals yielded nonmonophyly of Arachnidaas well with nonsignificant nodal support at the base ofEuchelicerata (supplementary fig S1 SupplementaryMaterial online)

Sequential concatenation of six matrices with varying levelof gene occupancy did not suggest a uniform effect of missingdata either for several relationships of interest (fig 4) Thesmallest matrix with gene occupancy of over 35 taxa per gene(23 orthologs 7510 sequence occupancy) recoveredsome relationships with slightly lower node support thatwere robustly supported elsewhere (eg ChelicerataTetrapulmonata Arachnopulmonata) consistent with thetheoretical expectation of increased nodal support with ac-cruing data for robust nodes Support for Arachnida was con-sistently low excepting its recovery with moderate support inthe AL30 matrix (gene occupancy of at least 30 taxa 98orthologs 6820 sequence occupancy supplementary figS2 Supplementary Material online) This recovery of a mono-phyletic Arachnida through exclusive use of molecular dataalbeit with limited support is a rare result (Regier et al 2010)

FIG 7 Consensus of PhyloBayes tree topologies across four independent runs Left Tree topology inferred using the 500 slowest-evolving genes RightTree topology inferred using the MARE1 matrix Dotted line represented abbreviated branch length for Glycyphagus domesticus (lengths of 1545 and1076 for left and right topologies respectively)

2972

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 11: Phylogenomic Interrogation of Arachnida Reveals Systemic

Nevertheless as this threshold of gene occupancy (represen-tation for at least 30 taxa) is at neither extreme of the param-eterrsquos spectrum of values we considered this resultinsufficient evidence for a systemic and pervasive effect ofmissing data on tree topology inference We therefore inves-tigated another aspect of our data set that might reveal phy-logenetic support for Arachnida namely rate of molecularevolution

Serial concatenation by evolutionary rate was conductedfor 15 matrices wherein orthologs were concatenated in orderof percent pairwise identity beginning with the most con-served In contrast to concatenation by gene occupancy weobserved distinct patterns in trajectories of nodal supportupon addition of genes enabling discernment of robustlysupported nodes that were insensitive to evolutionary rate(eg Chelicerata Euchelicerata Tetrapulmonata) and nodesthat were supported by a subset of genes with slower rates(eg Arachnida Pseudoscorpiones + Scorpiones) (fig 5)Intriguingly the matrix of the 200- and 500-slowest evolvingorthologs recovered interrelationships between arachnidclades characterized by very short internodes in spite of rel-atively large numbers of potentially informative genes at

those internodes (supplementary figs S4 and S5Supplementary Material online) corroborating previous in-ferences of ancient and rapid radiation inferred from the fossilrecord and molecular phylogenetic efforts (Giribet et al 2002Dunlop 2010 Regier et al 2010)

We also considered the possibility that evolutionary rateand degree of missing data were conflated We dissected therelationship between these two variables and found that thematrices concatenated by evolutionary rate had approxi-mately the same amount of gene occupancy (3623ndash4392) with slightly higher occupancy in the slowest-evolving data sets (supplementary fig S6 SupplementaryMaterial online) In contrast matrices concatenated by geneoccupancy exhibit a clear negative relationship with evolu-tionary rate with denser matrices including a larger propor-tion of slowly evolving genes (supplementary fig S6Supplementary Material online)mdashgenes that are more con-served across taxa The latter correlation suggests that evolu-tionary rate is a stronger explanatory variable than amount ofmissing data not only for phylogenetic signal but also for theamount of missing data itself In spite of this trend concate-nation in order of percent pairwise identity did not affect

FIG 8 Relationships of Arachnida inferred from ML analysis of 3644 orthologs (as in fig 2) Numbers on nodes indicate number of gene trees congruentwith nodes numbers in parentheses indicate total number of potentially informative genes (for nodes of interest) Area shaded in gray indicatesTetrapulmonata with branch lengths extended for clarity

2973

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 12: Phylogenomic Interrogation of Arachnida Reveals Systemic

representation of basal nodes (supplementary fig S3Supplementary Material online) with the ingroup (MRCAof Arachnida) represented in over 98 of orthologs The neg-ative relationship between evolutionary rate and gene occu-pancy could reflect biological pattern (eg orthologs withconserved function and sequence may be expressed inmany taxa) the algorithmic operation of OMA (Altenhoffet al 2013) or a combination of the two

The number of different ML topologies obtained with var-ious matrices is strongly suggestive of systemic incongruenceof phylogenetic signal attributable in part to LBA artifacts inall topologies we obtained from the incremental concatena-tion series with nonmonophyly of Arachnida arachnidparaphyly was caused by the suspect placement ofPseudoscorpiones Acariformes andor Parasitiformes outsideof the clade uniting the remaining arachnids and Xiphosura(figs 2 3 6 and 7 and supplementary fig S1 SupplementaryMaterial online) Consistent with this prediction as well twoseparate Bayesian inference analyses (the 500-slowest evolv-ing orthologs and MARE1) yielded a polytomy or infinitesimalbranch lengths at the base of Arachnida (fig 7) with longbranches subtending the problematic orders The LG4X + Fand CAT + GTR models utilized in ML and Bayesian inferenceanalyses respectively were chosen for their superior ability toaccommodate site-heterogeneous patterns of molecular evo-lution particularly in the event of LBA artifacts (Lartillot andPhilippe 2004 2008 Le et al 2012) But upon examination ofbranch lengths across postburn-in samples from thePhyloBayes runs (based on the 500 slowest-evolving orthologdata set) we continued to observe significantly longer branchlengths under a Bayesian relative rates test (Wilcox et al 2004)for all three problematic orders (Acariformes Parasitiformesand Pseudoscorpiones) with respect to the MRCA ofEuchelicerata in comparison to the remaining eucheliceratetaxa (Pltlt 00001 for all comparisons fig 9) Together withthe ML topologies these analyses reinforce the inference ofshort internodes in the euchelicerate tree of life precipitatedby rapid divergence of basal lineages and marked heterotachyin problematic orders

However concatenation methods have been critiqued fortheir tendency to mask phylogenetic conflict when stronggene tree incongruence is incident (Jeffroy et al 2006Kubatko and Degnan 2007 Nosenko et al 2013 Salichosand Rokas 2013) and some authors have advocated againstthe use of bootstrap resampling in phylogenomic analyses(Salichos and Rokas 2013 DellrsquoAmpio et al 2014) Severalnewly developed metrics for quantifying gene tree incongru-ence are inapplicable to partial trees (eg internode certaintySalichos and Rokas 2013) Therefore we visualized the dom-inant bipartitions among the ML gene tree topologies byconstructing supernetworks using the SuperQ method(Greurounewald et al 2013) which decomposes all gene treesinto quartets to build supernetworks where edge lengths cor-respond to quartet frequencies (see Fernandez et al 2014)We inferred supernetworks for four data sets 1) 3644 ortho-logs 2) the 200 slowest-evolving orthologs 3) the 500 slowest-evolving orthologs and 4) the 1237 orthologs of the MARE2data set which retain only the 30 data-rich taxa Invariably we

observed an unresolved star topology with numerous reticu-lations corresponding to basal divergences of Arachnida (fig10) The retention of gene tree incongruence irrespective ofvarious approaches to matrix optimization (eg matrix reduc-tion gene occupancy evolutionary rate) indicates that con-flicts at the base of Arachnida are not solely attributable tostochastic sampling error but are systemic The greateramount of reticulation in supernetworks of slowly evolvinggenes corroborates previous reports of greater gene tree in-congruence in slowly evolving partitions (Salichos and Rokas2013 but see Betancur-R et al 2014)

The sum of these analyses suggests that phylogenetic res-olution of Arachnida is beleaguered by incongruent signal indifferent genomic regions which in turn is a function of evo-lutionary rate and exacerbated by LBA artifacts in rapidlyevolving lineages The combination of long-branch taxa inci-dent heterotachy and discordant results stemming from con-sideration of evolutionary rate closely reflects historicaldisputes over the root of Metazoa and tradeoffs betweenamount and parameterization of sequence data (Dunnet al 2008 Philippe et al 2009 2011 Schierwater et al 2009Nosenko et al 2013) Beyond demonstrating concealed phy-logenetic support for arachnid monophyly these results areilluminative in understanding why a morphologically well-circumscribed group has historically proven difficult to re-cover in molecular phylogenetic analyses We note that con-catenation by order of evolutionary rate proved an effectiveapproach of identifying this phylogenetic incongruence Serialconcatenation strategies that rely upon random sampling oforthologs (eg RADICAL Narechania et al 2012 Simon et al2012) were trialed in our data set but did not identify incon-gruence when it was highly localized For example unless arandom concatenation series captures a significant propor-tion of that subset supporting Pseudoscorpiones +Arachnopulmonata (200 out of 3644 orthologs) this incon-gruence will not be identified Apropos of the 98 orthologs inthe AL30 matrix which supported the monophyly ofArachnida 47 are among the 500 slowest evolving orthologsfurther indicating that rate of evolution has greater explana-tory power than missing data toward accounting for arachnidnonmonophyly

Evaluating Historical Hypotheses Based onMorphology Pseudoscorpions and Solifugae

We observed a nodal support profile similar to that ofArachnida for the clades (Pseudoscorpiones + Acari) and(Ricinulei + Solifugae) (fig 5) neither historically supportedby analyses of morphological characters (see Shultz 2007 forthe most recent analysis and summary of hypotheses)Monophyly of Acari (Acariformes + Parasitiformes) is conten-tious as molecular sequence data do not uniformly supportthis clade (Giribet et al 2002 Pepato et al 2010 Regier et al2010 but see Meusemann et al 2010) and relevant morpho-logical characters may have undergone extensive conver-gence For example a six-legged postembryonic stageoccurs not only in Acariformes and Parasitiformes but alsoin the distantly related Ricinulei The embryology and genome

2974

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 13: Phylogenomic Interrogation of Arachnida Reveals Systemic

of T urticae have also demonstrated the loss of opisthosomalsegmentation and such conserved genes as the Hox transcrip-tion factor abdominal-A found in all other surveyed arach-nids (Hughes and Kaufman 2002 Grbic et al 2011 Sharmaet al 2012b)

A greater measure of confidence was once placed in thesister relationship of pseudoscorpions and solifuges(Weygoldt and Paulus 1979 Shultz 1990 2007 but seeDunlop et al 2012) with these forming part of a clade withscorpions and harvestmen (fig 1) (Weygoldt and Paulus 1978Shultz 1990 2007 Giribet et al 2002) The sister relationshipof Pseudoscorpiones and Solifugae is consistent with anumber of morphological characters such as two-segmentedchelicerae the shape of the epistome (a structure coveringthe mouth) and similarity in walking leg segments (Shultz2007) However recent advances in comparative develop-mental genetics have elucidated a simple mechanism forthe transition from the three- to the two-segmented chelic-eral state Loss of the expression domain of the transcriptionfactor dachshund which in three-segmented chelicerae isuniquely expressed in the proximal segment (Prpic andDamen 2004 Sharma et al 2012a 2013 Barnett andThomas 2013) Similarly the highly variable gnathobasicmouthparts of spiders (the pedipalpal ldquomaxillardquo) mites (therutellum) and harvestmen (the stomotheca) all result fromthe same embryonic tissues whose growth is disrupted by

abrogation of the appendage-patterning transcription factorDistal-less (Schoppmeier and Damen 2001 Khila and Grbic2007 Sharma et al 2013) These data underscore the evolu-tionary lability of cheliceral segment number and enditicmouthparts and disfavor their utility as phylogenetic charac-ter systems

Few morphological characters could potentially unitePseudoscorpiones and Acari Nevertheless the relationshipof Pseudoscorpiones and Acariformes is consistent with thepresence of prosomal silk glands in these orders (producedfrom cheliceral glands in the former and salivary glands insome lineages of the latter in contrast spider silk is producedfrom dedicated opisthosomal organs) A reciprocal BLAST(Basic Local Alignment Search Tool) search against spiderand Tetranychus silk protein sequences revealed no hits inthe transcriptome of the pseudoscorpion Synsphyronus api-melus and thus we are presently unable to evaluate the ho-mology of mite and pseudoscorpion silks based on theirprotein structure

Interestingly in contrast to Arachnida nodal support forthe clade (Ricinulei + Solifugae) is more recalcitrant to con-catenation of fast-evolving genes Only after concatenation of3000 orthologs does nodal support for this relationshipbecome fixed at zero (fig 5) Neither this relationship northe placement of Opiliones as sister group of(Ricinulei + Solifugae) has been supported previously in the

FIG 9 Bayesian relative rates test indicating patristic distances from MRCA of Euchelicerata to constituent terminals Samples are drawn from 4000postburn-in tree topologies (1000 randomly sampled trees per chain) from analyses of the 500 slowest-evolving ortholog data set in Phylobayes

2975

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 14: Phylogenomic Interrogation of Arachnida Reveals Systemic

literature (very limited support was obtained forRicinulei + Solifugae by Regier et al 2010) and both warrantfurther anatomical and embryological investigation

The Placement of Palpigradi

The 47 total species in the supermatrix represented all buttwo orders of Chelicerata Schizomida and PalpigradiSchizomida is a lineage consistently recovered as sistergroup of Thelyphonida and the pair of orders comprisesthe clade Uropygi (Shultz 1990 2007 Giribet et al 2002Pepato et al 2010 Regier et al 2010) Several charactersunite Schizomida and Thelyphonida including a uniquemating behavior (Shultz 1990) and the mutual monophylyof the two orders has never been tested (ie one may con-stitute a nested lineage of the other)

In contrast the placement of Palpigradi an order of mi-nuscule (typicallylt 1 mm in length) chelicerates (see Giribetet al 2014) has long been in dispute Various analyses ofdifferent data classes have suggested that palpigrades aresister to Acariformes (Van der Hammen 1989 Regier et al2010) Acari (Pepato et al 2010) Tetrapulmonata (Shultz1990 Wheeler and Hayashi 1998) Solifugae (Giribet et al2002) or all remaining arachnids (Shultz 2007) At the heartof the contention is the interpretation of palpigrade

opisthosomal ldquosacsrdquo three pairs of putatively respiratorystructures that appear in opisthosomal sternites 4ndash6 Thesestructures were historically inferred to be either the precur-sors or the remnants of book lungs but a respiratory functionof these structures has never been conclusively demonstratedAs we were unable to include genome-scale data forPalpigradi we separately assessed the placement of thisorder using the 56 Sanger-sequenced genes for P wheelerifrom Regier et al (2010) These were analyzed as part of thematrix with the 500 slowest-evolving orthologs

This analysis recovered P wheeleri as sister group toParasitiformes with little support (BS = 50) and modestreduction of nodal support elsewhere in the tree relation-ships among and within other orders were unchanged(supplementary fig S7 Supplementary Material online)Moreover given the placement of Palpigradi within a groupof lineages with suspect placement as well as its infinitesimalrepresentation in the matrix (three orthologs out of 500) weconsider the placement of this order insufficiently addressedin this study

Outgroup Sampling and LBA Artifacts

One of the commonly invoked solutions for redressing topo-logical instability is depth of taxonomic sampling and

FIG 10 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets Phylogenetic conflict is representedby reticulations Edge lengths correspond to quartet frequencies Shading indicates base of arachnid radiation

2976

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 15: Phylogenomic Interrogation of Arachnida Reveals Systemic

particularly of closely related outgroup taxa This is a practicethat a subset of the authors has historically advocated inarthropod molecular phylogenies (Wheeler et al 1993Wheeler and Hayashi 1998 Giribet et al 2001) Howeverfor the majority of the analyses conducted in this study sam-pling of Mandibulata was restricted to three myriapod spe-cies out of a total of ten outgroups (three onychophoransthree myriapods two pycnogonids and two xiphosurans)This deliberate selection of mandibulate exemplars wasdriven by the topologically stable position of Myriapoda inthe arthropod tree of life (Regier et al 2010 Rehm et al 2014)and their relatively short-branch lengths in comparison to allof Pancrustacea as inferred by recent phylogenomic analyses(Regier et al 2010) An argument favoring selection of out-group taxa that have short patristic distances from theirMRCA with the ingroup taxa (ie low substitution rates)has been previously articulated by several workers as a solu-tion for overcoming LBA artifacts (Bergsten 2005 Rota-Stabelli and Telford 2008) which constitute a documentedaffliction of arachnid phylogenies (Pepato et al 2010 Arabiet al 2012) Although the restriction of mandibulate exem-plars to myriapods means that nodal support for Myriapodamay be overestimated in our analyses (due to the conflationof mandibulate synapomorphies with support for myriapodmonophyly) our focus was on arachnid internal phylogenyand the inclusion of Pycnogonida and Xiphosura as the mostclosely related outgroup taxa to Arachnida was the greaterimperative for this study Comparable considerations in out-group sampling among arthropods are exemplified by thephylogenomic analyses of Regier et al (2010) who sampledjust five outgroups for their analysis but specifically chose theclosest relatives of Arthropoda (three Onychophora and twoTardigrada) due to their focus on arthropod internal relation-ships Brewer and Bond (2013) used a tick a water flea (pan-crustacean) and a centipede to root their internal millipedephylogeny In an analysis of Opiliones Hedin et al (2012) usedone outgroup (in one analysis two) a tick or a chimericscorpion terminal the latter constructed from libraries ofthree different families of Scorpiones These last two casesrepresent a much sparser outgroup sampling than the oneconducted here

Nevertheless to demonstrate that sampling of outgroupswithout regard for branch length distributions can in factexacerbate topological conflict when LBA artifacts alreadyplague the ingroup we conducted a separate family of anal-yses wherein we added four pancrustacean taxa retaining thecriterion of comparable data (ie Illumina libraries or com-plete well-annotated genomes) the dipteran Drosophila mel-anogaster (complete genome) the coleopteran Triboliumcastaneum (complete genome) the hemipteran Oncopeltusfasciatus (Illumina transcriptome Ewen-Campen et al 2011)and the amphipod Parhyale hawaiensis (Zeng et al 2011)Due to incident concerns about missing data in this dataset (see above) 454 libraries for pancrustaceans were notanalyzed A fifth published pancrustacean data set(Daphnia pulex complete genome) was added in a prelimi-nary analysis but proved greatly problematic due to a branchlength significantly longer than all other pancrustacean taxa

analyzed and ensuing LBA artifacts (see also Fernandez et al2014) Results from the data set including Daphnia (a 52-taxon data set) are therefore not presented here

As shown in figure 11 ML topologies of identical datapartitions augmented with pancrustacean outgroups demon-strate significantly longer patristic distances for Pancrustaceain comparison to Myriapoda (ie with respect to the MRCAof arthropods see also Regier et al 2010) corroborating thepreferential selection of myriapods as outgroups in the mainanalyses of this study The branch lengths of Pancrustacea as awhole are comparable to those of Acariformes and the in-clusion of Pancrustacea renders Chelicerata nonmonophy-letic in some analyses due to the phylogenetic proximity ofPancrustacea and Pycnogonida Analysis of an augmentedMARE2 matrix (which lacks pycnogonids) did not yield man-dibulate monophyly either with euchelicerates nested withina paraphyletic Mandibulata (not shown) As demonstrated byvisualizations of gene tree conflict (fig 12) these placementsof Pancrustacea are directly attributable to LBA with a largenumber of genes recovering a close relationship of Parhyaleand Acariformes particularly in slower-evolving gene parti-tions that are more susceptible to topological incongruence(Salichos and Rokas 2013) Although larger data sets do mit-igate this problem (3644-ortholog and MARE2 matrices fig12) the lack of resolution at the base of Arachnida is entirelyunaffected by the addition of pancrustacean outgroups re-gardless of outgroup data set size (fig 12 compare with fig10) and Arachnida was not recovered as monophyletic by theaugmented data sets

These analyses reinforce the importance of outgroupchoice when addressing lineages with demonstrable long-branch ingroup taxa and justify the limitation of mandibulatesampling to myriapods in our central analyses The additionof outgroup data sets with nearly complete gene sampling(eg Drosophila Tribolium) cannot surmount LBA artifactswhen a large swath of basally branching pancrustacean taxa isnot represented Admittedly the basal placement ofPancrustacea may be greatly improved by extensively sam-pling basally branching lineages throughout Mandibulata butthis is demonstrably the case even using a minuscule fractionof the genes deployed herein (eg eight-gene analysis ofGiribet et al 2001 the 62-gene analysis of Regier et al2010) For the present analysis pancrustacean genomic re-sources comparable to those of model organisms like the fourspecies we analyzed (in addition to the chelicerate data wegenerated) remain limited and particularly for such key line-ages as copepods ostracods remipedes cephalocarids andcollembolans

Terrestrialization and Morphological Convergence

Although a number of aforementioned problems plague in-ference of chelicerate relationships perhaps the most haunt-ing concern is that the pursuit and identification ofphylogenetic signal supporting the monophyly of Arachnidais itself an idiosyncratic objective A feature common to earlyhypotheses of arthropod phylogeny largely driven by mor-phological data was the monophyly of Atelocerata a clade

2977

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 16: Phylogenomic Interrogation of Arachnida Reveals Systemic

comprising myriapods and hexapods (eg Snodgrass 1938Schram and Emerson 1991 Wheeler et al 1993 Fortey et al1997 Emerson and Schram 1998 Kraus 1998 Wheeler 1998)Numerous aspects of anatomy united the two terrestrialmandibulate clades such as the absence of tritocerebral ap-pendages the uniramy of present appendages and characterspertaining to tracheal respiration This hypothesis was pro-gressively overturned in favor of Pancrustacea (a clade unitinghexapods with a paraphyletic assemblage collectively termedldquocrustaceansrdquo) by accruing molecular data (eg Turbevilleet al 1991 Friedrich and Tautz 1995 Giribet et al 19962001 Giribet and Ribera 1998 Zrzavy et al 1998Koenemann et al 2010 Meusemann et al 2010 Campbellet al 2011 Rota-Stabelli et al 2011) The diphyly ofAtelocerata rendered a large number of its putative synapo-morphies homoplastic the mechanism of which is inferred tobe convergence due to terrestrial habitat (Legg et al 2013)

If terrestrialization has driven morphological convergencein the sister group of Chelicerata then any epistemologicaljustification for minimizing the number of inferred

terrestrialization events in the chelicerate branch of the treeis greatly diminished Integral to the reconstruction of terres-trialization is the phylogenetic placement of scorpions whichhave an impressive Paleozoic fossil record demonstratingmarine (or at least aquatic) origins (Jeram 1998 Dunlopet al 2007 Dunlop 2010) Some workers therefore consideredscorpions sister to the remaining arachnids (Weygoldt andPaulus 1979) or separate from arachnids and sister toXiphosura (Van der Hammen 1989 ref Dunlop andWebster 1999 for a review of these hypotheses) In contrastothers have inferred a single terrestrialization event in theancestor of Arachnida based on comparative morphologyand putative homology of scorpion and tetrapulmonate re-spiratory systems (Firstman 1973 Scholtz and Kamenz 2006Kamenz et al 2008) and circulatory systems (Wirkner et al2013) implicit in this inference is the assumption that theMRCA of scorpions and tetrapulmonates was the ancestralarachnid (ie the topology of Shultz 1990 2007) Given thestrongly supported placement of Scorpiones sister toTetrapulmonatamdasha result recovered elsewhere with

FIG 11 Left Tree topology inferred from ML analysis of the 200 slowest-evolving genes with four pancrustacean outgroups added Right Tree topologyinferred from ML analysis of the 500 slowest-evolving genes with four pancrustacean outgroups added Orders of interest are indicated with shadingNumbers on nodes indicate bootstrap resampling frequencies Bars on right represent number of genes available for each terminal Below Global viewsof the sequence alignments

2978

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 17: Phylogenomic Interrogation of Arachnida Reveals Systemic

moderate support (Regier et al 2010)mdashor in some analysesthe sister relationship of (Scorpiones + Pseudoscorpiones) toTetrapulmonata (fig 6) the tree topology at hand obviatesinference of a terrestrial arachnid ancestor The phylogeneticproximity of Scorpiones to Tetrapulmonata is certainly con-sistent with the homology of their respiratory and circulatorysystems (ie implies a single origin of the arachnid book lung)but it implies that a single terrestrialization event is not aviable reconstruction of chelicerate evolutionary historyeven if Arachnida is recovered as monophyletic Cheliceratatherefore could encompass an evolutionary history with mul-tiple terrestrialization events in a manner comparable toMandibulata (von Reumont et al 2012)

Due to the incidence of heterotachy for some chelicerateorders and concealed phylogenetic signal for several plausiblephylogenetic hypotheses we tentatively favor elements of thetree topology recovered by the slower-evolving orthologs thatare consistent with morphological analyses such as arachnidmonophyly the placement of Pseudoscorpiones within orsister to Arachnopulmonata and the relationship of(Opiliones + (Ricinulei + Solifugae)) We add the caveat thatthese relationships should be rigorously tested using broader

sampling of constituent terminals (particularly of basallybranching exemplars of the problematic orders) We alsonote that derived relationships particularly within the well-sampled orders Opiliones and Araneae unambiguously sup-port traditional systematics within these groups (Shultz 1998Coddington et al 2004 Giribet et al 2010 Sharma and Giribet2011 Hedin et al 2012)

ConclusionWe assessed the phylogeny of Arachnida using the largestphylogenomic assessment to date and conducted 40 separateanalyses varying different parameters of matrix assembly andalgorithmic approach We identified PseudoscorpionesAcariformes and Parasitiformes as problematic long-branchtaxa even upon inclusion of full genomes and use of sophis-ticated substitution models We consistently observed shortinternodes in basal relationships attributable to ancient rapidradiation Serial concatenation in order of percent pairwiseidentity demonstrates systemic conflicts in phylogeneticsignal Topological conflict at the base of Arachnida is re-tained among gene trees across data sets regardless of evo-lutionary rate or minimization of missing data Slow-evolving

FIG 12 Supernetwork representation of quartets derived from individual ML gene trees for four different data sets with four pancrustacean outgroupsadded (indicated by boxes) Phylogenetic conflict is represented by reticulations Edge lengths correspond to quartet frequencies Shading indicates baseof arachnid radiation

2979

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 18: Phylogenomic Interrogation of Arachnida Reveals Systemic

orthologs recovered the monophyly of Arachnida the clade(Pseudoscorpiones + Arachnopulmonata) and the relation-ship of (Opiliones + (Ricinulei + Solifugae)) Addition of out-groups without regard for evolutionary rate is shown toexacerbate LBA artifacts A number of relationships was un-ambiguously supported in all analyses Future investigationsof chelicerate phylogeny should emphasize taxonomic sam-pling efforts for breaking long branches by identifying basallybranching and putatively slowly evolving exemplars of theorders Pseudoscorpiones Acariformes and ParasitiformesImproving matrix representation of Palpigradi is mandatoryfor assessing its placement within Chelicerata

Materials and Methods

Species Sampling and Molecular Techniques

Taxa were sampled toward maximizing phylogenetic repre-sentation within Chelicerata Given the availability of wholegenomes for both Parasitiformes and Acariformes we focusedon generating Illumina paired-end read data for other cheli-cerate orders New transcriptomic data were thus generatedfor 17 euchelicerate taxa Collecting locality information andstatistics on sequencing yields are provided in table 1 andsupplementary table S1 Supplementary Material online Alltissues were collected fresh and prepared immediately forRNA extraction Total RNA was extracted using TRIReagent (Invitrogen) mRNA samples were prepared withthe Dynabeads mRNA Purification Kit (Invitrogen) followingthe manufacturerrsquos protocol Quality of mRNA was measuredwith picoRNA assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) quantity was measured with a RNA assay inQubit fluorometer (Life Technologies)

Almost all cDNA libraries were constructed with theTruSeq for RNA Sample Preparation kit (Illumina) followingthe manufacturerrsquos instructions three spider libraries(Frontinella communis Leucauge venusta and Neoscona ara-besca) were prepared using the PrepX mRNA kit for Apollo324 (Integen) Samples were barcoded with TruSeq adaptorsfacilitating multiplexed sequencing runs with up to four spe-cimens per lane sequenced on the Illumina Hi-Seq 2500 plat-form Concentration of cDNA was measured using dsDNAHigh Sensitivity assays in an Agilent 2100 Bioanalyzer (AgilentTechnologies) as well as a Qubit fluorometer (Invitrogen)Samples were run using a Illumina Genetic Analyzer II orIllumina HiSeq 2500 platform with paired-end reads of100ndash150 bp at the FAS Center for Systems Biology atHarvard University excepting the Mastigoproctus giganteuslibrary which was sequenced using single-end reads of 100 bp

Sequenced results were quality filtered accordingly to athreshold average quality Phred score of 30 and adaptorstrimmed Retained reads shorter than 25 bp were discardedRibosomal RNA (rRNA) was filtered out using Bowtie v100(Langmead et al 2009) by constructing a set of metazoanrRNA sequences from GenBank with the command ldquobowtie-buildrdquo Each sample was sequentially aligned to the indexallowing up to two mismatches through Bowtie v100 re-taining all unaligned reads in FASTQ format (command lineldquo-a ndashbestrdquo) Unaligned results were paired using read header

information and exported as separate files for left and rightmate pairs This process was repeated for each sample andreads corresponding to rRNA sequences were removed

De novo assemblies were conducted using Trinity(Grabherr et al 2011 Haas et al 2013) using paired readfiles with the exception of hardware specifications (num-bers of dedicated CPUs) For the single-end read library ofM giganteus a path reinforcement distance of 25 wasenforced for paired-end libraries a value of 75 was enforcedRaw reads have been deposited in the NCBI (National Centerfor Biotechnology Information) Sequence Read Archive data-base All assemblies alignments matrices and tree topologiesgenerated in this study are deposited in the Dryad DigitalRepository

We added 30 other taxa to the data set of 18 Illumina-sequenced chelicerates (table 1) Peptide sequences fromwhole genomes of I scapularis and T urticae were accessedfrom the public databases Vectorbase and Ensembl respec-tively A pair of outgroup species Alipes grandidieri(Myriapoda) and Peripatopsis capensis (Onychophora) weresequenced by a subset of the authors using the same methodsand platforms for a separate study (Fernandez et al 2014) Asubset of smaller assembled 454 libraries and Sanger-se-quenced ESTs were accessed from GenBank (table 1)Finally we obtained reads for eight harvestman libraries pub-lished by Hedin et al (2012) and sanitized and assembledthese de novo in Trinity as described above

Orthology Assignment

Redundancy reduction was done with CD-HIT (Fu et al 2012)for all Trinity assemblies (95 similarity) Resulting culled as-semblies were processed in TransDecoder (Haas et al 2013) inorder to identify candidate ORFs within the transcriptsPredicted peptides were then processed with a further filterto select only one peptide per putative unigene by choosingthe longest ORF per Trinity subcomponent with a pythonscript thus removing the variation in the coding regions ofTrinity assemblies due to alternative splicing closely relatedparalogs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as fasta files We assigned pre-dicted ORFs into orthologous groups across all samples usingOMA stand-alone v099t (Altenhoff et al 2011 2013) Allinput files were single line multi-fasta files and the parame-tersdrw file specified retained all default settings We paralle-lized all-by-all local alignments across 200 CPUs In a separatefamily of analyses additional outgroups were added to theoutput from the principal OMA run using best reciprocalBLAST hits implemented in a customized script

Phylogenomic Analyses

We constructed an amino acid supermatrix by concatenatingthe set of OMA groups containing 13 or more taxa (geneoccupancy of 25) which were extracted using a Unix scriptThese 3644 orthogroups were aligned individually usingMUSCLE v36 (Edgar 2004) Ambiguously aligned positionswere culled using GBlocks v091b (Castresana 2000) as well asregions where more than 50 of the columns constituted

2980

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 19: Phylogenomic Interrogation of Arachnida Reveals Systemic

missing data (using the command ndashb5 = h) Trimmedorthogroups were concatenated using Phyutility 26 (Smithand Dunn 2008) for this and other phylogenomic matrices(see below)

Due to the degree of missing data for the palpigradeP wheeleri represented by 56 Sanger-sequenced genes weprincipally analyzed our data set without this speciesadding the palpigrade terminal in a final analysis to inferthe placement of this enigmatic order Attempts to obtainnew transcriptomes of Eukoenenia spelaea (from Slovakia)and Eukoenenia sp (from Mexico) were unsuccessful due tothe diminutive size and scarcity of these animals

To assess the impact of gene occupancy on nodal supportand tree topology we constructed five other matrices at dif-ferent thresholds of gene occupancy An additional 15 matri-ces were constructed based on evolutionary rate for whichpercent pairwise identity was employed as a proxy Thismethod was chosen to approximate evolutionary rate be-cause it is agnostic to tree topology Percent pairwise identitywas calculated for each orthogroup alignment in Geneiousv616 and is computed by taking all possible pairs of bases atthe same column and scoring a hit (one) when a pair isidentical this value is divided by the total number of pairsfor each column and all column values are then averagedover the length of the alignment Cells with indels (ldquo-rdquo) andmissing taxa do not contribute to pairwise calculations Thepercent pairwise identity was calculated for each of the 3644orthogroup alignments subsequent to treatment withGBlocks v 091b This ensured that regions with large indelsdid not affect the calculation of percent pairwise identity

As an algorithmic alternative to optimization of matrixconstruction we used the software MARE (MAtrixREduction httpmarezfmkde last accessed November 112013) which estimates informativeness of every orthogroupbased on weighted geometry quartet mapping (Nieselt-Struwe and von Haeseler 2001) We analyzed the 3644orthogroups with gene occupancy over 25 using MAREand thereby constructed two matrices First we implementedMARE enforcing retention of all libraries Second we re-peated algorithmic matrix reduction but enforced retentionof only those 30 libraries constituting Illumina libraries orwhole genomes (ie keeping only terminals with significantlymore data than 454 or Sanger-sequenced EST counterparts)As a separate test of influence of data quantity we con-structed a manual reduction of the 3644-ortholog superma-trix retaining only the 30 libraries constituting Illuminalibraries or whole genomes

ML analyses were conducted using RAxML v775 (Bergeret al 2011) and Bayesian inference using PhyloBayes MPI 14e(Lartillot et al 2013) For RAxML v775 we implemented aunique LG4X + F model for each gene (Le et al 2012)Bootstrap resampling frequencies were estimated with 500replicates using a rapid bootstrapping algorithm (Stamatakiset al 2008) Analyses with PhyloBayes MPI 14e were limitedto smaller data sets as implementation of PhyloBayes for largematrices requires intractable amounts of time for conver-gence We implemented PhyloBayes MPI 14e using thesite-heterogeneous CAT + GTR model of evolution (Lartillot

and Philippe 2004) Four independent chains were run for6965ndash10145 cycles and the initial 5000 cycles were discardedas burn-in with convergence assessed using the maximumbipartition discrepancies across chains

Supplementary MaterialSupplementary figures S1ndashS7 and table S1 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark S Harvey Lionel Monod Peter J Schwendinger andPiotr Naskrecki kindly provided samples of Synsphyronus api-melus Liphistius malayanus and Ricinoides atewa AntjeFischer and Woods Hole Marine Biological Laboratory pro-vided specimens of Limulus polyphemus Sonia CS AndradeJerome Murienne and Ana Riesgo prepared some of theIllumina libraries Thorsten Burmester provided the 454 tran-scriptomic assembly of Pandinus imperator Marshal Hedincontributed raw read data for the eight Opiliones libraries wereassembled Adrian Altenhoff James Cuff Geoffrey F DillyPaul Edmon Mike Ethier and Rosa Fernandez facilitated com-puting operations Discussions with Casey Dunn and AntonisRokas refined some of the ideas presented This research wassupported by internal funds from the Museum ofComparative Zoology to GG by National ScienceFoundation grants DEB-1144492 and DEB-114417 to GHand GG and by the National Science FoundationPostdoctoral Research Fellowship in Biology under GrantNo DBI-1202751 to PPS The authors are indebted toEditor-in-Chief Sudhir Kumar Associate Editor NicolasVidal and three anonymous reviewers for reviewing thismanuscript

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Arabi J Judson ML Deharveng L Lourenco WR Cruaud C Hassanin A2012 Nucleotide composition of CO1 sequences in Chelicerata(Arthropoda) detecting new mitogenomic rearrangements J MolEvol 7481ndash95

Barnett AA Thomas RH 2013 The expression of limb gap genes in themite Archegozetes longisetosus reveals differential patterning mech-anisms in chelicerates Evol Dev 15280ndash292

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Bergsten J 2005 A review of long-branch attraction Cladistics 21163ndash193

Betancur-RR Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Brewer MS Bond JE 2013 Ordinal-level phylogenomics of the arthropodclass Diplopoda (millipedes) based on an analysis of 221 nuclearprotein-coding loci generated using next-generation sequence anal-yses PLoS One 8e79935

Briggs DEG Siveter DJ Sieveter DJ Sutton MD Garwood RJ Legg D2012 Silurian horseshoe crab illuminates the evolution of arthropodlimbs Proc Natl Acad Sci U S A 10915702ndash15705

2981

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 20: Phylogenomic Interrogation of Arachnida Reveals Systemic

Campbell LI Rota-Stabelli O Edgecombe GD Marchioro T Longhorn SJTelford MJ Philippe H Rebecchi L Peterson KJ Pisani D 2011MicroRNAs and phylogenomics resolve the relationships ofTardigrada and suggest that velvet worms are the sister group ofArthropoda Proc Natl Acad Sci U S A 10815920ndash15924

Cao Z Yu Y Wu Y Hao P Di Z He Y Chen Z Yang W Shen Z He Xet al 2013 The genome of Mesobuthus martensii reveals a uniqueadaptation model of arthropods Nat Commun 42602

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Clouse RM Sharma PP Giribet G Wheeler WC 2013 Elongation factor-1 a putative single-copy nuclear gene has divergent sets of para-logs in an arachnid Mol Phylogenet Evol 68471ndash481

Coddington JA Giribet G Harvey MS Prendini L Walter DE 2004Arachnida In Cracraft J Donoghue MJ editors Assembling thetree of life New York Oxford University Press p 296ndash318

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dunlop JA 2010 Geological history and phylogeny of ChelicerataArthropod Struct Dev 39124ndash142

Dunlop JA Kamenz C Scholtz G 2007 Reinterpreting the morphologyof the Jurassic scorpion Liassoscorpionides Arthropod Struct Dev 36245ndash252

Dunlop JA Kreurouger J Alberti G 2012 The sejugal furrow in camel spidersand acariform mites Arachnol Mitt 438ndash15

Dunlop JA Webster M 1999 Fossil evidence terrestrialization andarachnid phylogeny J Arachnol 2786ndash93

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Emerson MJ Schram FR 1998 Theories patterns and reality game planfor arthropod phylogeny In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 67ndash86

Ewen-Campen B Shaner N Panfilio K Suzuki Y Roth S Extavour CG2011 The maternal and early embryonic transcriptome of the milk-weed bug Oncopeltus fasciatus BMC Genomics 1261

Felsenstein J 1978 Cases in which parsimony or compatibility methodswill be positively misleading Syst Biol 27401ndash410

Fernandez R Laumer CE Vahtera V Libro S Kaluziak ST Sharma PPPerez-Porro AR Edgecombe GD Giribet G 2014 Evaluating topo-logical conflict in centipede phylogeny using transcriptomic datasets Mol Biol Evol 311500ndash1513

Firstman B 1973 The relationship of the chelicerate arterial system tothe evolution of the endosternite J Arachnol 11ndash54

Fortey RA Briggs DEG Wills MA 1997 The Cambrian evolutionarylsquoexplosionrsquo recalibrated BioEssays 19429ndash434

Friedrich M Tautz D 1995 Ribosomal DNA phylogeny of the majorextant arthropod classes and the evolution of myriapods Nature376165ndash167

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Bagu~na J Riutort M Ribera C 1996 First molec-ular evidence for the existence of a Tardigrada + Arthropoda cladeMol Biol Evol 1376ndash84

Giribet G Edgecombe GD Wheeler WC 2001 Arthropod phylog-eny based on eight molecular loci and morphology Nature 413157ndash161

Giribet G Edgecombe GD Wheeler WC Babbitt C 2002 Phylogeny andsystematic position of Opiliones a combined analysis of cheliceraterelationships using morphological and molecular data Cladistics 185ndash70

Giribet G McIntyre E Christian E Espinasa L Ferreira RL Francke OFHarvey MS Isaia M Kovac L McCutchen L et al 2014 The firstphylogenetic analysis of Palpigradi (Arachnida)mdashthe most enig-matic arthropod order Invertebr Syst 28 doi101071IS13057

Giribet G Ribera C 1998 The position of arthropods in the animalkingdom a search of a reliable outgroup for internal arthropodphylogeny Mol Phylogenet Evol 9481ndash488

Giribet G Vogt L Perez Gonzalez A Sharma P Kury AB 2010 A multi-locus approach to harvestmen (Arachnida Opiliones) phylogenywith emphasis on biogeography and the systematics of LaniatoresCladistics 26408ndash437

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Greurounewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo transcriptsequence reconstruction from RNA-seq using the Trinity platformfor reference generation and analysis Nat Protoc 81494ndash1512

Hedin M Derkarabetian S McCormack M Richart C Shultz JW 2010The phylogenetic utility of the nuclear protein-coding gene EF-1for resolving recent divergences in Opiliones emphasizing intronevolution J Arachnol 389ndash20

Hedin M Starett J Akhter S Scheuroonhofer AL Shultz JW 2012Phylogenomic resolution of Paleozoic divergences in harvestmen(Arachnida Opiliones) via analysis of next-generation transcriptomedata PLoS One 7e42888

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Bagu~na J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc Lond B Biol Sci 2764261ndash4270

Huelsenbeck JP 1997 Is the Felsenstein zone a fly trap Syst Biol 4669ndash74

Hughes CL Kaufman TC 2002 Hox genes and the evolution of thearthropod body plan Evol Dev 4459ndash499

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Jeram AJ 1998 Phylogeny classifications and evolution of Silurian andDevonian scorpions In Selden PA editor In Proceedings of the 17thEuropean Colloquium of Arachnology Edinburgh BritishArachnological Society Burnham Beeches p 17ndash31

Kamenz C Dunlop JA Scholtz G Kerp H Hass H 2008 Microanatomyof Early Devonian book lungs Biol Lett 4212ndash215

Khila A Grbic M 2007 Gene silencing in the spider mite Tetranychusurticae dsRNA and siRNA parental silencing of the Distal-less geneDev Genes Evol 217241ndash251

Koenemann S Jenner RA Hoenemann M Stemme T von ReumontBM 2010 Arthropod phylogeny revisited with a focus on crusta-cean relationships Arthropod Struct Dev 3988ndash110

Kraus O 1998 Phylogenetic relationships between higher taxa of tra-cheate arthropods In Fortey RA Thomas RH editors Arthropodrelationships London Chapman amp Hall p 295ndash303

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Lamsdell JC 2013 Revised systematics of Palaeozoic lsquohorseshoe crabsrsquoand the myth of monophyletic Xiphosura Zool J Linn Soc 1671ndash27

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

2982

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 21: Phylogenomic Interrogation of Arachnida Reveals Systemic

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Legg DA Sutton MD Edgecombe GD 2013 Arthropod fossil data in-crease congruence of morphological and molecular phylogenies NatCommun 42485

Mallatt JM Garey JR Shultz JW 2004 Ecdysozoan phylogeny andBayesian inference first use of nearly complete 28S and 18S rRNAgene sequences to classify the arthropods and their kin MolPhylogenet Evol 31178ndash191

Masta SE Longhorn SJ Boore JL 2009 Arachnid relationships based onmitochondrial genomes asymmetric nucleotide and amino acidbias affects phylogenetic analyses Mol Phylogenet Evol 50117ndash128

Masta SE McCall A Longhorn SJ 2010 Rare genomic changes andmitochondrial sequences provide independent support for congru-ent relationships among the sea spiders (Arthropoda Pycnogonida)Mol Phylogenet Evol 5759ndash70

Meusemann K von Reumont BM Simon S Roeding F Strauss S Keurouck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Narechania A Baker RH Sit R Kolokotronis S-O DeSalle R Planet PJ2012 Random addition concatenation analysis a novel approach tothe exploration of phylogenomic signal reveals strong agreementbetween core and shell genomic partitions in the cyanobacteriaGenome Biol Evol 430ndash43

Nieselt-Struwe K von Haeseler A 2001 Quartet-mapping a generaliza-tion of the likelihood-mapping procedure Mol Biol Evol 181204ndash1219

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Meurouller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Obst M Faurby S Bussarawit S Funch P 2012 Molecular phylog-eny of extant horseshoe crabs (Xiphosura Limulidae) indicatesPaleogene diversification of Asian species Mol Phylogenet Evol6221ndash26

Pepato AR da Rocha CEF Dunlop JA 2010 Phylogenetic position of theacariform mites sensitivity to homology assessment under totalevidence BMC Evol Biol 10235

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWeuroorheide G Baurain D 2011 Resolving difficult phylogenetic ques-tions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Derelle R Lopez P Pick K Borchiellini C Boury-Esnault NVacelet J Renard E Houliston E Queinnec E et al 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Zhao Y Brinkmann H Rodrigue N Delsuc F 2005Heterotachy and long-branch attraction in phylogenetics BMCEvol Biol 550

Prpic N-M Damen WGM 2004 Expression patterns of leg genes in themouthparts of the spider Cupiennius salei (Chelicerata Arachnida)Dev Genes Evol 214296ndash302

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Rehm P Meusemann K Borner J Misof B Burmester T 2014Phylogenetic position of Myriapoda revealed by 454 transcriptomicsequencing Mol Phylogenet Evol 7725ndash33

Riesgo A Andrade SCS Sharma PP Novo M Perez-Porro AR Vahtera VGonzalez VL Kawauchi GY Giribet G 2012 Comparative descrip-tion of ten transcriptomes of newly sequenced invertebrates and

efficiency estimation of genomic sampling in non-model taxa FrontZool 933

Roeding F Borner J Kube M Klages S Reinhardt R Burmester T 2009 A454 sequencing approach for large scale phylogenomic analysis ofthe common emperor scorpion (Pandinus imperator) MolPhylogenet Evol 53826ndash834

Rokas A Carroll SB 2006 Bushes in the tree of life PLoS Biol 4e352Rokas A King N Finnerty J Carroll SB 2003 Conflicting phylogenetic

signals at the base of the metazoan tree Evol Dev 5346ndash359Rota-Stabelli O Campbell L Brinkmann H Edgecombe GD Longhorn SJ

Peterson KJ Pisani D Philippe H Telford MJ 2011 A congruentsolution to arthropod phylogeny phylogenomics microRNAs andmorphology support monophyletic Mandibulata Proc R Soc Lond BBiol Sci 278298ndash306

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Rota-Stabelli O Telford MJ 2008 A multi criterion approach for theselection of optimal outgroups in phylogeny recovering some sup-port for Mandibulata over Myriochelata using mitogenomics MolPhylogenet Evol 48103ndash111

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Schierwater B Eitel M Jakob W Osigus HJ Hadrys H Dellaporta SLKolokotronis SO Desalle R 2009 Concatenated analysis sheds lighton early metazoan evolution and fuels a modern ldquourmetazoonrdquohypothesis PLoS Biol 7e1000020

Scholtz G Kamenz C 2006 The book lungs of Scorpiones andTetrapulmonata (Chelicerata Arachnida) evidence for homologyand a single terrestrialization event of a common arachnid ancestorZoology 1092ndash13

Schoppmeier M Damen WGM 2001 Double-stranded RNA interfer-ence in the spider Cupiennius salei the role of Distal-less is evolu-tionarily conserved in arthropod appendage formation Dev GenesEvol 21176ndash82

Schram FR Emerson MJ 1991 Arthropod pattern theory a new ap-proach to arthropod phylogeny Mem Queensl Mus 311ndash18

Sharma PP Giribet G 2011 The evolutionary and biogeographic historyof the armoured harvestmenmdashLaniatores phylogeny based on tenmolecular markers with the description of two new families ofOpiliones (Arachnida) Invertebr Syst 25106ndash142

Sharma PP Schwager EE Extavour CG Giribet G 2012a Evolution of thechelicera a dachshund domain is retained in the deutocerebral ap-pendage of Opiliones (Arthropoda Chelicerata) Evol Dev 14522ndash533

Sharma PP Schwager EE Extavour CG Giribet G 2012b Hoxgene expression in the harvestman Phalangium opilio reveals diver-gent patterning of the chelicerate opisthosoma Evol Dev 14450ndash463

Sharma PP Schwager EE Extavour CG Wheeler WC 2014 Hox geneduplications correlate with posterior heteronomy in scorpions ProcR Soc Lond B Biol Sci 28120140661

Sharma PP Schwager EE Giribet G Jockusch EL Extavour CG 2013Distal-less and dachshund pattern both plesiomorphic andapomorphic structures in chelicerates RNA interferencein the harvestman Phalangium opilio (Opiliones) Evol Dev 15228ndash242

Shultz JW 1990 Evolutionary morphology and phylogeny of ArachnidaCladistics 61ndash38

Shultz JW 1998 Phylogeny of Opiliones (Arachnida) an assessment ofthe ldquoCyphopalpatoresrdquo concept J Arachnol 26257ndash272

Shultz JW 2007 A phylogenetic analysis of the arachnid orders based onmorphological characters Zool J Linn Soc 150221ndash265

Simon S Narechania A DeSalle R Hadrys H 2012 Insect phylogenomicsexploring the source of incongruence using new transcriptomicdata Genome Biol Evol 41295ndash1309

Smith SA Dunn CW 2008 Phyutility a phyloinformaticstool for trees alignments and molecular data Bioinformatics 24715ndash716

2983

Phylogenomic Interrogation of Arachnida doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 22: Phylogenomic Interrogation of Arachnida Reveals Systemic

Snodgrass RE 1938 Evolution of the Annelida Onychophora andArthropoda Smithson Misc Collect 971ndash159

Stamatakis A Hoover P Rougemont J 2008 A rapid bootstrap algo-rithm for the RAxML Web servers Syst Biol 57758ndash771

Turbeville JM Pfeifer DM Field KG Raff RA 1991 The phylogeneticstatus of arthropods as inferred from 18S rRNA sequences Mol BiolEvol 8669ndash686

Van der Hammen L 1989 An introduction to comparative arachnologyLeiden (United Kingdom) SPB Academic Publishing

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Weygoldt P Paulus HF 1979 Untersuchungen zur MorphologieTaxonomie und Phylogenie der Chelicerata Z Zool Syst Evol 1785ndash116 177ndash200

Wheeler WC 1998 Sampling groundplans total evidence and the sys-tematics of arthropods In Fortey RA Thomas RH editorsArthropod relationships London Chapman amp Hall p 87ndash96

Wheeler WC Cartwright P Hayashi CY 1993 Arthropod phylogeny acombined approach Cladistics 14173ndash192

Wheeler WC Hayashi CY 1998 The phylogeny of the extant chelicerateorders Cladistics 14173ndash192

Wiens JJ 2006 Missing data and the design of phylogenetic analysesJ Biomed Inform 3934ndash42

Wilcox TP Garcıa de Leon FJ Hendrickson DA Hillis DM 2004Convergence among cave catfishes long-branch attractionand a Bayesian relative rates test Mol Phylogenet Evol 311101ndash1103

Wirkner CS Teuroogel M Pass G 2013 The arthropod circulatory system InMinelli A Boxshall G Fusco G editors Arthropod biology and evo-lution molecules development and morphology Heidelberg(Germany) Springer Press p 343ndash391

Zeng V Villanueva KE Ewen-Campen B Alwes F Browne WE ExtavourCG 2011 De novo assembly and characterization of a maternal anddevelopmental transcriptome for the emerging model crustaceanParhyale hawaiensis BMC Genomics 12581

Zrzavy J Hypsa V Vlaskova M 1998 Arthropod phylogeny taxonomiccongruence total evidence and conditional combinationapproaches to morphological and molecular data sets In ForteyRA Thomas RH editors Arthropod relationships LondonChapman amp Hall p 97ndash107

2984

Sharma et al doi101093molbevmsu235 MBE at A

merican M

useum of N

atural History on February 17 2015

httpmbeoxfordjournalsorg

Dow

nloaded from