molecular phylogenetics and evolution · fig. 1. alternative topologies proposed by (a) ghedotti...

8
Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev Phylogenomic analysis of Fundulidae (Teleostei: Cyprinodotiformes) using RNA-sequencing data Rachel Rodgers a , Jennifer L. Roach b , Noah M. Reid b,c , Andrew Whitehead b , David D. Duvernell a,d, a Biological Sciences, Southern Illinois University Edwardsville, Edwardsville, IL 62026, USA b Department of Environmental Toxicology, University of California, Davis, CA 95616, USA c Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA d Biological Sciences, Missouri University of Science and Technology, Rolla, MO 65401, USA ARTICLE INFO Keywords: Fundulidae Fundulus Phylogenetics Phylogenomics RNA-seq ABSTRACT Fishes of the New World cyprinodontiform family Fundulidae display a wide variety of tolerance to environ- mental conditions, making them a valuable model system for comparative, evolutionary, and environmental studies. Despite numerous attempts to resolve the phylogenetic relationships of family Fundulidae, the basal structure of the phylogeny remains unresolved. The lack of a robust and fully resolved phylogeny for family Fundulidae and its most speciose genus Fundulus is an impediment to future research. This study utilized novel RNA-sequencing data for phylogenetic inference among16 members of Fundulidae to better rene the basal nodes of the family and confront long-standing questions regarding (1) the monophyletic status of genus Fundulus, and validity of the Lucania and recently synonymized Adinia genera; (2) the relationship of the west coast endemic Fundulus parvipinnis and F. lima to other Fundulus species; and (3) the validity of subgeneric classications. In addition, previously published nuclear gene sequences for 32 Fundulidae species were re- analyzed in combination with novel RNA-sequencing data. Maximum likelihood and Bayesian analyses gener- ated identical phylogenies with strong statistical support at nearly all nodes, demonstrating the utility of RNA- sequencing data in constructing robust phylogenies not achievable by previous methods. While many past hy- pothesized evolutionary relationships for Fundulidae were reinforced, several alternative relationships are hy- pothesized at basal nodes resulting in a re-analysis of the deeper structure of family Fundulidae. These results reveal family Fundulidae as a paraphyletic grouping of members of genus Fundulus and Lucania and supports the previous synonymy of genus Adinia with genus Fundulus. 1. Introduction Family Fundulidae (Cyprinodontiformes) is comprised of approxi- mately 44 species of sh inhabiting fresh, brackish, and marine en- vironments within North and Central American coastal and inland systems, Bermuda and Cuba (Parenti, 1981). Two species (Fundulus parvipinnis and F. lima) are endemic to the California coast with the remaining species found east of the Continental Divide (Ghedotti and Davis, 2017, 2013). Occupied habitats vary widely in environmental conditions such as temperature, pH, and salinity levels (Whitehead, 2010). The fact that fundulids are cosmopolitan, and present along ecological gradients of temperature, salinity, and oxygen, and in se- verely polluted environments, has made the family a valuable model system for investigating the functional basis, and evolution, of phy- siological resilience to temperature, salinity, hypoxia, and environmental pollution (Burnett et al., 2007; Reid et al., 2016; Whitehead, 2010). The importance of Fundulidae as a comparative model system de- mands a robust and well-supported phylogeny. However, the taxonomic and systematic status of the family is complex and disputed (Cashner et al., 1992). Early inuential phylogenetic analyses were performed through morphological comparisons (Parenti, 1981; Wiley, 1986). Later phylogenetic studies utilized a variety of molecular characters in- cluding the partial mitochondrial cytochrome b (cytb) (Bernardi, 1997; Bernardi and Powers, 1995) and nuclear gene sequences (Whitehead, 2010), and total-evidence analyses including nucleotide and non-nu- cleotide character data (Ghedotti and Davis, 2013). While substantial progress has been made in resolving many interspecies relationships, the deeper evolutionary structure of the family remains unresolved. The most recent published Fundulidae phylogeny (Ghedotti and https://doi.org/10.1016/j.ympev.2017.12.030 Received 25 July 2017; Received in revised form 27 November 2017; Accepted 27 December 2017 Corresponding author at: Biological Sciences, Missouri University of Science and Technology, Rolla, MO 65401, USA. E-mail address: [email protected] (D.D. Duvernell). Molecular Phylogenetics and Evolution 121 (2018) 150–157 Available online 28 December 2017 1055-7903/ © 2017 Elsevier Inc. All rights reserved. T

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

Phylogenomic analysis of Fundulidae (Teleostei: Cyprinodotiformes) usingRNA-sequencing data

Rachel Rodgersa, Jennifer L. Roachb, Noah M. Reidb,c, Andrew Whiteheadb,David D. Duvernella,d,⁎

a Biological Sciences, Southern Illinois University Edwardsville, Edwardsville, IL 62026, USAbDepartment of Environmental Toxicology, University of California, Davis, CA 95616, USAc Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USAd Biological Sciences, Missouri University of Science and Technology, Rolla, MO 65401, USA

A R T I C L E I N F O

Keywords:FundulidaeFundulusPhylogeneticsPhylogenomicsRNA-seq

A B S T R A C T

Fishes of the New World cyprinodontiform family Fundulidae display a wide variety of tolerance to environ-mental conditions, making them a valuable model system for comparative, evolutionary, and environmentalstudies. Despite numerous attempts to resolve the phylogenetic relationships of family Fundulidae, the basalstructure of the phylogeny remains unresolved. The lack of a robust and fully resolved phylogeny for familyFundulidae and its most speciose genus Fundulus is an impediment to future research. This study utilized novelRNA-sequencing data for phylogenetic inference among16 members of Fundulidae to better refine the basalnodes of the family and confront long-standing questions regarding (1) the monophyletic status of genusFundulus, and validity of the Lucania and recently synonymized Adinia genera; (2) the relationship of the westcoast endemic Fundulus parvipinnis and F. lima to other Fundulus species; and (3) the validity of subgenericclassifications. In addition, previously published nuclear gene sequences for 32 Fundulidae species were re-analyzed in combination with novel RNA-sequencing data. Maximum likelihood and Bayesian analyses gener-ated identical phylogenies with strong statistical support at nearly all nodes, demonstrating the utility of RNA-sequencing data in constructing robust phylogenies not achievable by previous methods. While many past hy-pothesized evolutionary relationships for Fundulidae were reinforced, several alternative relationships are hy-pothesized at basal nodes resulting in a re-analysis of the deeper structure of family Fundulidae. These resultsreveal family Fundulidae as a paraphyletic grouping of members of genus Fundulus and Lucania and supports theprevious synonymy of genus Adinia with genus Fundulus.

1. Introduction

Family Fundulidae (Cyprinodontiformes) is comprised of approxi-mately 44 species of fish inhabiting fresh, brackish, and marine en-vironments within North and Central American coastal and inlandsystems, Bermuda and Cuba (Parenti, 1981). Two species (Fundulusparvipinnis and F. lima) are endemic to the California coast with theremaining species found east of the Continental Divide (Ghedotti andDavis, 2017, 2013). Occupied habitats vary widely in environmentalconditions such as temperature, pH, and salinity levels (Whitehead,2010). The fact that fundulids are cosmopolitan, and present alongecological gradients of temperature, salinity, and oxygen, and in se-verely polluted environments, has made the family a valuable modelsystem for investigating the functional basis, and evolution, of phy-siological resilience to temperature, salinity, hypoxia, and

environmental pollution (Burnett et al., 2007; Reid et al., 2016;Whitehead, 2010).

The importance of Fundulidae as a comparative model system de-mands a robust and well-supported phylogeny. However, the taxonomicand systematic status of the family is complex and disputed (Cashneret al., 1992). Early influential phylogenetic analyses were performedthrough morphological comparisons (Parenti, 1981; Wiley, 1986). Laterphylogenetic studies utilized a variety of molecular characters in-cluding the partial mitochondrial cytochrome b (cytb) (Bernardi, 1997;Bernardi and Powers, 1995) and nuclear gene sequences (Whitehead,2010), and total-evidence analyses including nucleotide and non-nu-cleotide character data (Ghedotti and Davis, 2013). While substantialprogress has been made in resolving many interspecies relationships,the deeper evolutionary structure of the family remains unresolved.

The most recent published Fundulidae phylogeny (Ghedotti and

https://doi.org/10.1016/j.ympev.2017.12.030Received 25 July 2017; Received in revised form 27 November 2017; Accepted 27 December 2017

⁎ Corresponding author at: Biological Sciences, Missouri University of Science and Technology, Rolla, MO 65401, USA.E-mail address: [email protected] (D.D. Duvernell).

Molecular Phylogenetics and Evolution 121 (2018) 150–157

Available online 28 December 20171055-7903/ © 2017 Elsevier Inc. All rights reserved.

T

Page 2: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

Davis, 2013) classifies fundulids into three genera including Fundulus,Lucania, and the monotypic Leptolucania. Approximately 39 of the 44recognized species are considered to be members of genus Fundulus,which have been further divided into subgenera (e.g. most recentlyGhedotti and Davis (2013): Fundulus, Zygonectes, Plancterus and Wi-leyichthys). However, phylogenetic analyses have been unable to con-firm the monophyletic status of Fundulus and clearly delineate whichspecies should be considered valid Fundulus species, or validate sub-generic membership. Past studies have suggested the potential inclusionof Lucania species within Fundulus (Whitehead, 2010); the previouslyrecognized monotypic genus Adinia was officially synonymized withgenus Fundulus in 2013 (Ghedotti and Davis, 2013). Questions alsoremain regarding the evolutionary relationship of the west coast speciesF. parvipinnis and F. lima to all other Fundulus species.

Incongruences between Fundulidae phylogenies based on differentdata sets demonstrates the need for additional investigation. In recentyears, phylogenomic studies which utilize transcriptomic sequences(i.e. RNA-sequencing data) have validated the advantages of bringinglarge molecular data sets to bear in resolving complex evolutionaryrelationships (Harrison and Kidner, 2011; Hittinger et al., 2010). In thisstudy, we utilized newly-available RNA-sequencing (RNA-seq) data for16 Fundulidae species from two genera (Fundulus and Lucania), in-cluding F. xenicus (formerly Adinia xenica). The relationships of thesesixteen species reported in two of the most recent studies (Ghedotti andDavis, 2013; Whitehead, 2010) (Fig. 1) offer several alternative hy-potheses. We conducted maximum likelihood and Bayesian phyloge-nomic inference to generate a well-resolved and robustly supportedphylogeny. We compared these results with the most current publishedFundulidae phylogenies (Fig. 1). In addition, we analyzed a second dataset which combined RNA-seq data with the nuclear gene sequences forrecombination activating gene 1 (RAG1) and glycosyltransferase (gylt)(originally presented by Whitehead (2010) and re-analyzed by Ghedottiand Davis (2013)) for 16 additional Fundulidae species. We used theresults of these analyses to better understand the basal relationships ofthe Fundulidae phylogeny and to determine: (1) if genus Fundulus ismonophyletic or if Lucania should be synonymized with genus Fundulusas suggested in previous studies (Bernardi, 1997; Ghedotti and Davis,2013; Whitehead, 2010); (2) determine how the west-coast species

Fundulus parvipinnis and F. lima are related to other members of genusFundulus; (3) assess the validity of the most recently proposed sub-generic classification (Ghedotti and Davis, 2013).

2. Materials and methods

2.1. Data processing and transcriptome assembly

Fish were collected from 16 species using nets, seines or baited traps(Table 1), and transported live to UC Davis. After 1month acclimationto laboratory aquarium conditions, as part of a separate study of geneexpression responses to variation in salinity, fish were sacrificed andgill tissues were dissected and stored in RNA-later. Tissues werehomogenized and RNA was extracted using Trizol reagent and further

Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined nuclear and mitochondrial sequence analysisand (C) Whitehead’s (2010) nuclear sequence analysis. Topologies were re-drawn to include only those species investigated in the current study. Subgeneric and generic (Lucania)designations are indicated to the right of each topology.

Table 1Sources and locations of all Fundulidae specimens included in the ingroup.

Species Source Location Total reads

F. olivaceus David Duvernell Gasconade River, MO 101,434,157F. chrysotus Jake Schaefer Cossatot River, AR 129,963,482L. parva Rebecca Fuller Indian River, FL 127,960,850L. goodei Rebecca Fuller Delk's Bluff, Oklawaha

River, FL110,021,826

F. sciadicus David Duvernell Gasconade River, MO 60,660,884F. nottii Jake Schaefer Pascagoula River, MS 23,348,494F. catenatus David Duvernell Gasconade River, MO 165,105,977F. rathbuni Seth Kullman (Crystal

Lee Pow)Bolin Creek, NC 175,121,439

F. notatus David Duvernell Little Wabash River, IL 175,537,383F. heteroclitus Whitehead lab Chesapeake Bay, MD,

Point Lookout160,537,906

F. zebrinus Reid Morehouse Red River near Altus,OK

49,374,145

F. xenicus Jake Schaefer Graveline Bay, MS 106,282,282F. diaphanus Whitehead lab Potomac River, MD,

Piscataway Park100,265,388

F. similis Whitehead lab Galveston, TX 104,114,099F. grandis Whitehead lab Galveston, TX 234,670,168F. parvipinnis Kelly Weinersmith

(Alejandra Jaramillo)Carpinteria Salt MarshReserve, CA

92,590,817

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

151

Page 3: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

purified using Qiagen RNeasy spin columns. RNA quality was verifiedon an Agilent Bioanalyzer. Indexed RNA-seq libraries were preparedusing NEB Next Ultra RNA library prep kits for Illumina according tothe manufacturer’s protocol. Indexed samples were pooled and se-quenced (Illumina PE-100) on an Illumina HiSeq 2500 at the Vincent J.Coates Genomics Sequencing Laboratory at UC Berkeley.

Paired-end RNA-seq data with Phred33 quality encoding werequality trimmed and filtered with Trimmomatic v0.33 (Bolger et al.,2014) to remove adaptor sequences, low-quality bases (Phred qualityscore < 5) and low quality reads (average Phred quality score < 5).Gentle quality trimming was elected to optimize downstream de novotranscript assembly following recommendations from Macmanes(2014). Reads were removed from the data set if< 25 base pairs re-mained after all trimming steps were completed. After trimming, thetotal number of reads were compared among individuals within eachspecies, and the sample containing the highest number of reads for eachspecies was selected for further analysis.

De novo assembly of reads into transcripts was performed withTrinity release 2014-07-17 (Grabherr et al., 2011) and candidate codingregions were identified from the assembled transcriptomes withTransDecoder v3.0.0 (Haas and Papanicolaou, 2012). TransDecodergenerated both nucleotide and amino acid sequence files of candidatecoding regions in FASTA format for each species. Four fish species forwhich annotated transcript files were available (three in suborder Cy-prinodontodei along with Fundulidae; one in order Beloniformes) wereselected as the outgroup species for use in further analyses. The anno-tated amino acid files were downloaded from the Ensembl genomebrowser release 85 (Zerbino et al., 2015) for Xiphophorus maculatus andOryzias latipes, and from the National Center for Biotechnology In-formation (NCBI) Genome database (https://www.ncbi.nlm.nih.gov/genome) for Poecilia reticulata and Cyprinodon variegatus.

2.2. Orthologous gene identification

OrthoFinder v0.6.1 (Emms and Kelly, 2015) was used to identifysets of orthologous genes (“orthogroups”) from among the 16 Fundu-lidae species and the four outgroup species using the amino acid se-quence files. An orthogroup was selected for analysis if it containedexactly one gene sequence from at least one outgroup and nine ingroupspecies. Orthogroups with multiple sequences identified per specieswere not selected to avoid potential paralogous genes. For ingroupspecies, the nucleotide sequences for the genes identified in each or-thogroup were retrieved from the assembled transcriptome file byheader name using a custom Python script. Because the outgroup spe-cies’ amino acid sequence headers did not have an identical header inthe respective annotated nucleotide sequence files, each outgroupamino acid file was first converted into a BLAST+ database (BLAST+v2.2.29), and a tBLASTn search was performed on each identifiedamino acid gene sequence against the generated nucleotide database toretrieve the nucleotide sequence of the identified gene. The retrievednucleotide sequences for each orthogroup were written to a new FASTAfile for further processing.

2.3. Multiple sequence alignment and supermatrix construction

Multiple sequence alignment was performed on each orthogroupsequence file using MAFFT v7.222 (Katoh et al., 2002) with the “–auto”parameter to automatically select the best alignment method. All otherparameters were left at default values. Missing genes were coded asmissing data. Aligned orthogroups were manually edited in MEGA7(Kumar et al., 2016) using amino acid translation as a guide to inspectfor extraneous data, alignment mistakes (such as offset start or stopcodons), and poor alignments. Sequence data extending beyond thestart and stop codons of aligned blocks of data were removed. Pooralignments that could not be manually corrected were excluded fromfurther analysis.

Aligned and edited orthogroup sequences were concatenated end-to-end per species to generate a supermatrix in FASTA format. The startingand ending position of each gene within the supermatrix was recordedfor use in partitioned phylogenetic analysis. The supermatrix was con-verted into NEXUS format using an online format conversion tool(http://sequenceconversion.bugaco.com/converter/biology/sequences/fasta_to_nexus.php) and used for all phylogenomic analyses.

2.4. Phylogenomic analyses

The supermatrix, partition file and phylogenomic analysis para-meters were deposited to Mendeley Data. Six independent maximumlikelihood (ML) searches partitioned by gene were performed in RAxMLv8.2.0 (Stamatakis, 2014) under the General Time Reversible modelwith the gamma rate heterogeneity and parameter optimization foreach partition conducted at runtime. Resulting ML scores of the six treeswere compared for consistency to ensure that a sufficient number ofsearches had been performed. One thousand rapid bootstrap replicateswere performed on the best-scoring ML topology.

Bayesian inference (BI) was performed on partitioned (by gene) andunpartitioned alignments using the MPI version of MrBayes v3.2.6(Ronquist et al., 2011) using default priors and parameters. Modeljumping was used to sample among all parameters of the General TimeReversible model. Markov chain Monte Carlo (MCMC) was run for onemillion generations using eight chains across two independent runswith a 25% burn-in. MrBayes output was inspected for convergence,and adequate sampling and mixing following the guidelines providedby the MrBayes manual (Ronquist et al., 2011). MCMC results weresummarized in Tracer v1.6 (Rambaut and Drummond, 2013) and traceplots were inspected to confirm that the stationary distribution hadlikely been found and adequately sampled.

2.5. Alternative topology hypothesis testing

Alternative topological hypothesis testing was performed with theShimodaira-Hasegawa (SH) test (Shimodaira and Hasegawa, 1999) inRAxML v8.2.9. The topologies recovered by Whitehead’s nuclear genesequence analysis and combined nuclear and mitochondrial gene se-quence analysis (Whitehead, 2010) and Ghedotti and Davis’s total-evi-dence study (Ghedotti and Davis, 2013) were compared to the overallbest-scoring topology found for the data in RAxML. In addition, eachindividual difference in proposed relationships between the presentstudy topology and the Ghedotti and Davis’ total-evidence and White-head’s combined nuclear and mitochondrial gene analysis topology (the“alternative” topologies) were isolated and tested with the SH test.

The alternative topologies were re-created in Newick format to in-clude only the 16 Fundulidae species and the four outgroup speciesanalyzed in this study. New alternative trees were created in Newickformat to investigate each individual difference between the presentstudy topology and the alternatives. For each difference, the corre-sponding node in the original tree was moved to reflect its position inthe alternative(s) while all other relationships remained the same. Eachalternative topology served as a constraint tree for which six in-dependent ML analyses were performed in RAxML. Searches used thesame data, partitions, and parameters used to find the ML tree in thisstudy. The constraint tree with the highest ML score was selected foralternative hypothesis testing.

We also calculated the site-wise (SLS) and gene-wise (GLS) log-likelihood scores following Shen et al. (2017) using RAxML. The GLSfor each gene is the sum of SLS scores for all sites within each gene. Thedifferences in GLS scores (ΔGLS) between the ML and alternativetopologies were used to evaluate variation in phylogenetic informationamong genes as well as incongruences among genes in support of theML tree versus the six alternative constraint trees subjected to the SHtest. Outliers were identified as genes for which ΔGLS fell more thantwo standard deviations from the mean.

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

152

Page 4: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

2.6. Analysis of additional Fundulidae species using RAG1 and gylt

RAG1 and gylt gene sequences for 32 Fundulidae species weredownloaded from NCBI GenBank by accession number (Ghedotti andDavis, 2013). RAG1 and gylt gene sequences for outgroup species wereretrieved by keyword search in the Ensembl genome browser for O.latipes and X. maculatus, and through online BLASTn searches for P.reticulata and C. variegatus. The RAG1 and gylt sequences for the fouroutgroup species and 16 Fundulidae for which RNA-seq data wasavailable were searched against the supermatrix using local BLASTn toensure the genes were not already included in the original supermatrix.

RAG1 and gylt sequences were multiply aligned using MAFFT withthe “–auto” align parameter. Any missing species were coded as missingdata. The aligned sequences were appended onto the original super-matrix, and species not included in the original RNA-seq supermatrixwere coded as missing data for the length of the original supermatrix.The resulting supermatrix was partitioned by gene for ML searches, andremained unpartitioned for BI. Parameters remained the same as theanalyses performed on the original supermatrix, with the exception thatrapid bootstrapping in RAxML was performed via the “–autoMRE”parameter instead of using a pre-defined number of bootstrap re-plicates. The modified supermatrix and its associated partition file andphylogenomic analysis parameters were deposited to Mendeley Data.

3. Results

3.1. Phylogenomic analyses

OrthoFinder identified 53,811 putative groups of orthologous genes,152 of which were utilized for supermatrix construction and phyloge-netic analyses (listed in supplementary Table 1). The supermatrixconsisted of 89,519 aligned base pairs and 12,469 variable sites amongingroup species (28,641 variable sites with outgroup species included).

Analysis of RNA-seq data

ML (partitioned) and BI (partitioned and unpartitioned) searchesconverged on identical and fully resolved topologies with minor dif-ferences in branch lengths (Fig. 2). We had difficulty achieving sa-tisfactory sampling in the BI partitioned search, and so only the un-partitioned BI analysis results are reported here. Bootstrap andposterior probability support was strong at nearly every node.

Fundulus parvipinnis was recovered as the most distantly related ofFundulidae with the Lucania species sister to a grouping of all re-maining Fundulus species. With the exception of F. parvipinnis and F.similis, all Fundulus species fell within two major radiations corre-sponding largely to the Zygonectes – Plancterus and Fundulus clades asrecognized by both Whitehead (2010) and Ghedotti and Davis (2013).The two clades are connected by the shortest branch length in the tree(branch length of 0.0005), and the ML bootstrap support was weak forplacement of F. similis as sister to the Zygonectes – Plancterus clade.

3.2. Alternative topology hypothesis testing

The alternative topologies of Ghedotti and Davis (2013) total-evi-dence analysis, and Whitehead (2010) nuclear genes and combinedmitochondrial and nuclear gene sequence analyses, as well as thetopologies of six additional constraint trees were tested against thecurrent topology. The present study and alternative topologies variedwith respect to the placement of F. similis, F. chrysotus, F. parvipinnis,and genus Lucania. Ghedotti and Davis’ and Whitehead’s combined andnuclear only topologies (Fig. 1) were significantly worse explanationsfor the data (D(LH)=−2298, D(LH)=−2154 and D(LH)=−1890respectively, p < .01) (Table 2).

Five of the six additional constraints trees produced significantlyworse topologies. Each alternative tested is presented in Fig. 3. In

general, altering the placement of F. similis (Fig. 3C) as either sister tothe Zygonectes – Plancterus subclade, or the Fundulus subclade did notresult in significantly worse explanations of the data. Likewise, ananalysis of variation in phylogenetic information among individualgenes revealed that ΔGLS was positive (favoring the ML topology) forthe large majority of genes in comparisons involving five of the sixconstraint trees (Fig. 4). The proportion of genes favoring the ML to-pology ranged from 63 to 86% among these comparisons with only twosignificant negative outlier genes favoring the alternative placement ofF. chrysotus (Figs. 3 and 4A). In contrast, when the placement of F. si-milis was shifted between clades (Fig. 3C), the number of genes favoringthe ML and alternative topologies was nearly even (47% favoring MLtopology), with four significant outliers favoring the ML topology andsix outliers favoring the alternative (Fig. 4C).

3.3. Analysis of combined RNA-seq, RAG1, and gylt

ML and BI searches on the RNA-seq supermatrix plus RAG1 and gyltsequences converged on identical topologies (Fig. 5) with strong sta-tistical support at most nodes. The relationships recovered for the 16species included in the RNA-seq data set remained consistent. Thecomposition of the genus Fundulus subgenera are in agreement withthat of Ghedotti and Davis (2013) with the exception of the F. similis – F.majalis clade being recovered as sister to the Zygonectes – Plancterusclade (with weak support). The F. nottii species group shows severaldifferences in the relationships when compared to both Whitehead(2010) and Ghedotti and Davis (2013). This may have resulted from ashift in placement of F. chrysotus as sister to the F. nottii group as RNA-seq data was only available for F. nottii, and so could not have directlyinformed relationships within the F. nottii species group.

4. Discussion

The phylogenomic analyses of newly available, large transcriptomicdata produced fully-resolved and robustly supported phylogenetic trees.The relationships among the 16 species included in the RNA-seq dataset is consistent between analyses of RNA-seq data alone and RNA-seqdata plus RAG1 and gylt sequences. All analyses generated strong sta-tistical support for many weakly supported nodes in the Ghedotti andDavis phylogeny (2013) though not all relationships were supported.Previously well-supported sister relationships among species were af-firmed, including the L. goodei – L. parva, F. heteroclitus – F. grandis, F.rathbuni – F. diaphanus, and F. olivaceus – F. notatus clades. Severaldeeper nodes, as well as the weakly supported sister relationships in theGhedotti and Davis phylogeny (2013) did not find support when ana-lyzing RNA-seq data.

The most recently published Fundulidae phylogeny (Ghedotti andDavis, 2017, 2013) recognizes Lucania and Fundulus as valid genera.Our phylogenomic analysis instead recovers Fundulidae as a groupconsisting of a paraphyletic genus Fundulus. F. parvipinnis and F. limaform the most distantly-related clade with the Lucania species sister toall other Fundulus species. The placement of F. parvipinnis and Lucania inthis study is strongly supported over the alternatives (Fig. 3D–F, andTable 2), and supported by the majority of individual genes as well(Fig. 4D–F). The recovery of F. xenicus (formerly A. xenica) as an in-group within genus Fundulus has been previously hypothesized(Bernardi, 1997; Ghedotti and Davis, 2013; Whitehead, 2010) and thecurrent analysis confidently nests F. xenicus within subgenus Zygonectes.This result corroborates the synonymy of Adinia with genus Fundulus asdetermined by Ghedotti and Davis (2013). The inclusion of F. xenicuswithin Fundulus therefore does not pose a problem to the monophyly ofgenus Fundulus. Under the current classification scheme, however, therecovery of Lucania as an ingroup renders genus Fundulus paraphyletic.In order to achieve monophyletic status, genus Lucania would need tobe synonymized with genus Fundulus, or subgenus Wileyichthys (con-taining F. parvipinnis and F. lima) elevated to generic status.

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

153

Page 5: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

Fig. 2. The best phylogeny for 16 Fundulidaespecies recovered by ML and BI using a 152 genesupermatrix assembled from RNA-sequencingdata. Numbers at nodes indicate ML bootstrapsupport, and BI posterior probabilities, respec-tively (*=100% bootstrap, and 1.0 probability).Subgeneric and generic (Lucania) designations areindicated to the right. The four outgroup speciesare not shown.

Table 2Likelihood scores of all alternative topologies, difference in likelihood scores compared to the current topology, and results of the SH test as implemented in RAxML v8.2.9. Letters Athrough F refer to the constraint topologies presented in Fig. 3.

Tree Likelihood D(LH) Sig. Worse? (p < .05) Sig. Worse? (p < .02) Sig. Worse? (p < .01)

Current −348900.823140 N/A N/A N/A N/AGhedotti and Davis −351199.285830 −2298.462690 Yes Yes YesWhitehead (combined analysis) −351054.6032 −2153.780064 Yes Yes YesWhitehead (nuclear analysis) −350790.8354 −1890.012294 Yes Yes YesA −350450.6097 −1549.786567 Yes Yes YesB −349532.2474 −631.424237 Yes Yes YesC −348911.8665 −11.043372 No No NoD −349511.6366 −610.81349 Yes Yes YesE −349635.3284 −734.50523 Yes Yes YesF −349122.0841 −221.260972 Yes Yes Yes

Fig. 3. Alternative topologies reflecting each difference in relationships between the current best topology and Ghedotti and Davis’ (2013) total evidence analysis and Whitehead’s (2010)combined nuclear and mitochondrial gene sequence analysis. (A) F. chrysotus sister to F. xenicus. (B) F. similis as a member of the most divergent lineage of genus Fundulus. (C) F. similis asthe most divergent lineage in subgenus Fundulus. (D) F. parvipinnis forming a clade with Lucania and branching from subgenus Zygonectes. (E) F. parvipinnis forming a clade with F. zebrinusand branching from subgenus Zygonectes. (F) Lucania as a separate genus.

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

154

Page 6: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

Subgenera Zygonectes and Fundulus are generally recovered withrobust support, and interspecies relationships are fairly consistent withthose identified by Whitehead (2010) and Ghedotti and Davis (2013).The inclusion of F. similis (and F. majalis) as sister to the Zygonectes –Plancterus clade has not been reported in any other published phylo-geny, and is not fully supported by either likelihood or Bayesian

analysis (bootstrap=61%, posterior probability= 0.99). This place-ment tests the validity of subgenus Fundulus, which would be renderedparaphyletic (Figs. 2 and 5). The length of the branch uniting F. similiswith the Zygonectes – Plancterus clade is exceedingly short (Fig. 2), andtests of alternative hypotheses for the placement of F. similis within thephylogeny did not reveal a single best placement. Assigning the F.

Fig. 4. Distribution of phylogenetic signal for theML topology versus six alternative topologiesdescribed in Fig. 3. For each alternative topology,ΔGLS values were calculated as the difference ingene-wise log-likelihood scores between the MLand each alternative topologies. For each gene,positive values indicate greater support for theML topology while negative scores favor the al-ternative topology. Values that exceed two stan-dard deviations of the mean are highlighted inblack. Gene order follows Supplementary Table 1.

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

155

Page 7: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

similis – F. majalis species pair to a separate subgenus would resolve theproblem. Wiley (1986) had placed this species pair within anothersubgenus, Fontinus. However, that classification was even more pro-blematic as it also included F. seminolis and F. diaphanus along withseveral other species not included in this study.

5. Conclusions

The family Fundulidae has become a model group for a wide varietyof comparative studies, despite the lack of a fully resolved phylogeny.Our phylogenomic analysis of RNA-seq data has generated a robustphylogeny of the fundulids which will increase its overall utility. Our

Fig. 5. The best topology for 32 Fundulidae species recovered by ML and BI using a 152 gene supermatrix assembled from RNA-sequencing data for 16 Fundulidae combined with RAG1and gylt gene sequences for 32 Fundulidae species. Species for which RNA-seq data were available are indicated with open diamonds. Numbers at nodes indicate ML bootstrap support,and BI posterior probabilities, respectively (*=100% bootstrap, and 1.0 probability). Subgeneric and generic (Lucania) designations are indicated to the right. The four outgroup speciesare not shown.

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

156

Page 8: Molecular Phylogenetics and Evolution · Fig. 1. Alternative topologies proposed by (A) Ghedotti and Davis’s (2013) total-evidence analysis and (B) Whitehead’s (2010) combined

analysis also presents a new hypothesis of the deeper evolutionary re-lationships of family Fundulidae. Our findings do not confirm themonophyletic status of the family’s largest genus, Fundulus, or all of itsrecognized subgenera. Therefore, further taxonomic revision is war-ranted.

Acknowledgements

We thank Victor Jongeneel and Chris Fields for technical assistance,and Chet Langin and Edward Ackad for computing resources. ReidBrennan, Rebecca Fuller, Alejandra Jaramillo, Seth Kullman, CrystalLee, Reid Morehouse, and Jake Schaefer assisted with fish collections.Reid Brennan helped with animal care and dissection of tissues. CarlosPrada assisted with RNA-seq library preparation. “This work used theVincent J. Coates Genomics Sequencing Laboratory at UC Berkeley,supported by NIH S10 Instrumentation Grants S10RR029668 andS10RR027303.” This work used the Extreme Science and EngineeringDiscovery Environment (XSEDE, National Science Foundation grantnumber ACI-1053575) and the CC*DNI Networking Infrasturcutre(National Science Foundation grand number ACI-1541435). Researchwas supported by funding from the National Science Foundation (DEB-1265282 and ROA supplement DEB-1314561 to AW, and DEB-1556778to DDD). This effort was also supported by Southern Illinois UniversityEdwardsville (SIUE) which designated research funds to purchasehardware and provide technical support.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, in theonline version, at https://doi.org/10.1016/j.ympev.2017.12.030.

References

Bernardi, G., 1997. Molecular Phylogeny of the Fundulidae (Teleostei,Cyprinodontiformes) Based on the Cytochrome b Gene. In: Kocher, T.D., Stepien, C.A.(Eds.), Molecular Systematics of Fishes. Academic Press, San Diego, pp. 189–197.

Bernardi, G., Powers, D., 1995. Phylogenetic relationships among nine species from thegenus Fundulus (Cyprinodoniformes, Fundulidae) inferred from sequences of thecytochrome b gene. Copeia 1995, 469–473.

Bolger, A.M., Lohse, M., Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illuminasequence data. Bioinformatics 30, 2114–2120. http://dx.doi.org/10.1093/bioinformatics/btu170.

Burnett, K.G., Bain, L.J., Baldwin, W.S., Callard, G.V., Cohen, S., Di Giulio, R.T., Evans,D.H., Gómez-Chiarri, M., Hahn, M.E., Hoover, C.A., Karchner, S.I., Katoh, F.,MacLatchy, D.L., Marshall, W.S., Meyer, J.N., Nacci, D.E., Oleksiak, M.F., Rees, B.B.,Singer, T.D., Stegeman, J.J., Towle, D.W., Van Veld, P.A., Vogelbein, W.K.,Whitehead, A., Winn, R.N., Crawford, D.L., 2007. Fundulus as the premier teleostmodel in environmental biology: opportunities for new insights using genomics.Comp. Biochem. Physiol. – Part D Genom. Proteom. 2, 257–286. http://dx.doi.org/10.1016/j.cbd.2007.09.001.

Cashner, R.C., Rogers, J.S., Grady, J.M., 1992. Phylogenetic Studies of the GenusFundulus. In: Mayden, R.L. (Ed.), Systematics, Historical Ecology, and North

American Freshwater Fishes. Stanford University Press, Stanford, pp. 421–437.Emms, D.M., Kelly, S., 2015. OrthoFinder: solving fundamental biases in whole genome

comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16,157. http://dx.doi.org/10.1186/s13059-015-0721-2.

Ghedotti, M.J., Davis, M.P., 2017. The taxonomic placement of three fossil Fundulusspecies and the timing of divergence within the North American topminnows(Teleostei: Fundulidae). Zootaxa 4250, 577–586.

Ghedotti, M.J., Davis, M.P., 2013. Phylogeny, classification, and evolution of salinitytolerance of the North American topminnows and killifishes, family Fundulidae(Teleostei: Cyprinodontiformes). Fieldiana Life Earth Sci. 7, 1–65. http://dx.doi.org/10.3158/2158-5520-12.7.1.

Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis,X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke,A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman,N., Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without areference genome. Nat. Biotechnol. 29, 644–652. http://dx.doi.org/10.1038/nbt.1883.

Haas, B., Papanicolaou, A., 2012. TransDecoder (Find Coding Regions within Transcripts)[WWW Document]. URL https://transdecoder.github.io/.

Harrison, N., Kidner, C.A., 2011. Next-generation sequencing and systematics: what can abillion base pairs of DNA sequence data do for you? Taxon 60, 1552–1566.

Hittinger, C.T., Johnston, M., Tossberg, J.T., Rokas, A., 2010. Leveraging skewed tran-script abundance by RNA-Seq to increase the genomic depth of the tree of life. Proc.Natl. Acad. Sci. USA 107, 1476–1481. http://dx.doi.org/10.1073/pnas.0910449107.

Katoh, K., Misawa, K., Kuma, K., Miyata, T., 2002. MAFFT: a novel method for rapidmultiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30,3059–3066. http://dx.doi.org/10.1093/nar/gkf436.

Kumar, S., Stecher, G., Tamura, K., 2016. MEGA7: Molecular Evolutionary GeneticsAnalysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, msw054. http://dx.doi.org/10.1093/molbev/msw054.

Macmanes, M.D., 2014. On the optimal trimming of high-throughput mRNA sequencedata. Front. Genet. 5, 1–7. http://dx.doi.org/10.3389/fgene.2014.00013.

Parenti, L.R., 1981. A phylogenetic and biogeographic analysis of cyprinodontiform fishes(Teleostei, Atherinomorpha). Bull. Am. Museum Nat. Hist. 168, 335–357. http://dx.doi.org/10.1080/00222930210135631.

Rambaut, A., Drummond, A.J., 2013. Tracer v1.6. Available from http://tree.bio.ed.ac.uk/software/tracer/.

Reid, N.M., Proestou, D.A., Clark, B.W., Warren, W.C., Colbourne, J.K., Shaw, J.R.,Karchner, S.I., Hahn, M.E., Nacci, D., Oleksiak, M.F., Crawford, D.L., Whitehead, A.,2016. The genomic landscape of rapid repeated evolutionary adaptation to toxicpollution in wild fish. Science (80-.). 354, 1305–1308. http://dx.doi.org/10.1126/science.aah4993.

Ronquist, F., Huelsenbeck, J., Teslenko, M., 2011. MrBayes Version 3.2 Manual: Tutorialsand Model Summaries 1–103.

Shen, X.-X., Hittinger, C.T., Rokas, A., 2017. Contentious relationships in phylogenomicstudies can be driven by a handful of genes. Nat. Ecol. Evol. 1, 126. http://dx.doi.org/10.1038/s41559-017-0126.

Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of log-likelihoods with ap-plications to phylogenetic inference. Mol. Biol. Evol. 16, 1114–1116. http://dx.doi.org/10.1093/oxfordjournals.molbev.a026201.

Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysisof large phylogenies. Bioinformatics 30, 1312–1313. http://dx.doi.org/10.1093/bioinformatics/btu033.

Whitehead, A., 2010. The evolutionary radiation of diverse osmotolerant physiologies inkillifish (Fundulus sp.). Evolution (N.Y). 64, 2070–2085. http://dx.doi.org/10.1111/j.1558-5646.2010.00957.x.

Wiley, E.O., 1986. A study of the evolutionary relationships of Fundulus topminnows(teleostei: Fundulidae). Integr. Comp. Biol. 26, 121–130. http://dx.doi.org/10.1093/icb/26.1.121.

Zerbino, D.R., Wilder, S.P., Johnson, N., Juettemann, T., Flicek, P.R., 2015. The ensemblregulatory build. Genome Biol. 16, 56. http://dx.doi.org/10.1186/s13059-015-0621-5.

R. Rodgers et al. Molecular Phylogenetics and Evolution 121 (2018) 150–157

157