misled by the mitochondrial genome - göteborgs universitet · the mitochondrial genome is haploid...
TRANSCRIPT
Misled by the
mitochondrial genome A phylogenetic study in Topaza hummingbirds
Tobias Hofmann
Degree project for Master of Science (Two Years)
Biodiversity and Systematics
Degree course in Next-Generation Sequencing (60 hec)
Autumn 2014 and Spring 2015
Examiner: Bengt Oxelman
Supervisors: Urban Olsson, Alexandre Antonelli
Co-Supervisors: Bernard Pfeil, Mats Töpel
Department of Biological and Environmental Sciences
University of Gothenburg
Cover illustration by David Alker, published in Schmitz-Ornés & Haase (2009)
I
Abstract
Phylogenetic analyses on shallow evolutionary times pose several challenges, since many classically used genetic markers provide insufficient variable sites in order to reliably reconstruct evolutionary history. In animals, mitochondrial sequences are therefore preferably used for inference of recent evolutionary history, due to a higher variability in comparison to nuclear gene loci. However, a growing number of evidence points out specific perils concerning mitochondrial sequences challenging their utility for phylogenetic analyses. Here we show a case of recent population divergences within the hummingbird genus Topaza, in which the mitochondrial tree is deviating from the species tree, leading to different divergence patterns between nuclear and mitochondrial datasets. We apply state of the art phylogenetic and population genetic methods in order to infer species trees, to define and delimit genetically distinct subspecies, and to compare the admixture pattern among them. The resulting population pattern indicates that the Amazon River acts as a strong dispersal barrier for Topaza hummingbirds. The underlying dataset consists of thousands of Ultraconserved Elements (UCEs) and additional nuclear genes, which provide an extensive nuclear dataset as counterpart to the complete mitochondrial genome, which was sequenced within this study. Our study provides an exemplary case of a powerful genetic approach aimed to recover recent phylogenetic history on the subspecies level by taking a novel pathway in extracting genome-wide SNPs (Single Nucleotide Polymorphisms) from UCE data. We further discuss important biological information that can be inferred from the observed discrepancy between the mitochondrial tree and the species tree. Beyond that, this study provides a direct comparison of a variety of datasets and analytical methods, exploring their performance on shallow evolutionary times, aiming to provide information for future studies for an informed choice of the most suitable genetic dataset.
Keywords: Illumina sequencing, sequence capture, UCE, SNP, species tree, gene tree, mitochondrial genome
II
Content
Introduction ...................................................................................................................................... 1
The Mitochondrion ....................................................................................................................... 1
UCEs as a novel source of genetic data ........................................................................................ 3
SNPs .............................................................................................................................................. 4
Topaza ........................................................................................................................................... 4
Aims of this study .......................................................................................................................... 5
Methods ............................................................................................................................................ 6
Taxon sampling ............................................................................................................................. 6
Next-Generation Sequencing ........................................................................................................ 8
DNA extraction and library preparation ................................................................................... 8
Probe design ............................................................................................................................. 8
Sequence enrichment and sequencing ..................................................................................... 9
Data processing ........................................................................................................................... 10
Mitochondrial tree ...................................................................................................................... 13
Species tree ................................................................................................................................. 14
Nuclear dataset ....................................................................................................................... 14
Mixed dataset ......................................................................................................................... 15
DISSECT ................................................................................................................................... 15
UCEs ........................................................................................................................................ 16
SNPs ........................................................................................................................................ 16
Population structure ................................................................................................................... 17
Results ............................................................................................................................................. 18
Data exploration ......................................................................................................................... 18
Mitochondrial tree ...................................................................................................................... 22
Species tree ................................................................................................................................. 23
Individuals analyzed separately .............................................................................................. 23
Species delimitation analysis .................................................................................................. 27
III
Individuals assigned to populations ........................................................................................ 28
Population structure ................................................................................................................... 30
Discussion ....................................................................................................................................... 31
Evaluation of phylogenetic relationships .................................................................................... 31
Mitochondrial tree - the odd one out ......................................................................................... 32
Rivers as dispersal barriers ......................................................................................................... 34
Effect of mtDNA on species tree ................................................................................................. 35
Evaluation of datasets................................................................................................................. 35
Conclusion ....................................................................................................................................... 37
Acknowledgements ........................................................................................................................ 38
References ...................................................................................................................................... 39
Supplemental Material ................................................................................................................... 46
Introduction
1
Introduction
The Mitochondrion
The mitochondrial genome has been a very popular source of genetic information for animals
since the beginnings of phylogenetic DNA-analyses, because it is easy to access and exists in high
copy numbers in most tissue cells. Mitochondrial DNA (mtDNA) in animals is generally characterized
by high mutation rates in comparison to nuclear DNA [1]; these high mutation rates produce valuable
phylogenetic information, even on relatively shallow evolutionary times. In birds, the average length
of the mitochondrial genome lies at around 17,000 bp (value based on all currently available
mitochondrial genomes (n=403) for birds at NCBI) but varies quite substantially in length between
different bird clades, ranging up to a length of more than 22,000 bp in hornbills (Bucerotidae) [2].
The mitochondrion contains multiple mRNA translated genes which code for subunits of enzymes
involved in important cellular functions associated with the cell metabolism, namely the generation
of Adenosine triphosphate (ATP). These genes are the cytochrome oxidase units (cox 1-3), the NADH
dehydrogenase units (nad 1-6), the ATP synthase units (atp 6+8) and cytochrome b (an integral
membrane protein involved in the respiratory chain). Additionally, the mitochondrial genome
contains its own set of tRNA and rRNA (12S and 16S) coding sequences (see Figure 4).
Combinations of exclusively mitochondrial genes (most frequently cytb, nd2 and cox1) have been
commonly used in bird phylogenetics throughout the last decade to infer phylogenetic trees [3]–[5].
Due to their fast mutation rate (in animals), mitochondrial loci are often more informative, and
therefore are considered more suitable than nuclear loci to explore the more recent phylogenetic
history. However, there are certain caveats to consider when using mitochondrial sequences for
species tree inference, which we address and discuss within this study.
Mitochondrial tree discordance
In vertebrates the mitochondrion is inherited maternally as a functional unit of the egg-cell [6].
Cases of recombination within mitochondrial genomes have been detected in various phylogenetic
studies [2], [7]–[9], but appear to be rather the exception than the common case, even though
Sammler et al. [2] suggest a rather frequent mitochondrial recombination rate among hornbills
(Aves: Bucerotidae). It is therefore recommended to test for recombination when using
mitochondrial sequence data. If no recombination is detected, the mitochondrial genome is to be
treated as one, uniparentally inherited locus. The consequence is that all mitochondrial
genes/sequences from one individual are in complete linkage and share the same evolutionary
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
2
history. This is an important point to consider when using mitochondrial genes in phylogenetic
studies. Unlike nuclear loci, which in many cases are unlinked (unless they are located in relatively
close proximity on the same chromosome), different mitochondrial genes do not represent different
genealogies; therefore any tree based on only mitochondrial markers represents a single gene tree
and not the species tree. This is an important distinction because of a variety of factors that can lead
to gene tree species tree discordance, as broadly discussed in a recent review by Degnan and
Rosenberg [10]. The most common mechanisms possibly causing discordance between the
mitochondrial tree and the species tree in particular are shown in Figure 1.
Figure 1: Main mechanisms potentially causing discordance between the mitochondrial tree (red) and the species-
tree. Drawn in blue is an alternative genealogy that is not in discordance with the species tree. a) Incomplete lineage
sorting: Looking backward in time, different, unlinked gene loci are expected to coalesce at different times between taxa
A and B. The coalescence of single genealogies may date back further than the actual speciation event (deep
coalescence), particularly in populations with large effective population sizes (Ne). This can lead to some loci coalescing
with the outgroup taxon C before coalescing between the two sister taxa A+B (see left diagram), causing a discordance of
that particular gene tree with the species tree. This is referred to as incomplete lineage sorting. b) Introgression: In the
case that two species or populations are not completely reproductively isolated, eventual gene-flow may occur. Such
gene-flow, even if brought by only a small group or a single individual, can lead to fixation of the new genes in the gene-
pool of the receiving population, which is referred to as introgression. When sampling a locus that has been subject to
such introgression, the resulting gene tree shows discordance with the species tree (see right diagram). In the example
here the mitochondrial lineage (red) intogresses from taxon C into taxon B.
Linkage and non-neutrality
There are various concerns about the neutrality of the mitochondrial locus, which is an important
criterion for all phylogenetic models. One main argument in this context is that mitochondria in birds
are strictly maternally inherited and are therefore in complete genetic linkage with the W
chromosome (the female sex chromosome in birds) [11], [12]. Therefore, indirect selection acting on
mitochondria through linkage with the W chromosome is proposed to be rather common [13] and is
thought to be of major concern when using mitochondrial loci for phylogenetic inference, as it makes
a) Incomplete lineage sorting b) Introgression
Introduction
3
inferences based solely on mtDNA unreliable [14]. Additionally, several studies [14]–[16] have found
indication of direct selection on mitochondrial loci, which further enforces the above-mentioned
concerns.
Mitochondrial bias on tree inference methods
The mitochondrial genome is haploid and in vertebrates exclusively inherited maternally [6]. For
these reasons, mitochondrial genes are generally modeled to have one quarter of the population size
of nuclear genes. This has been argued to make mitochondrial genes a more reliable source in terms
of species tree inference, since genes with lower ploidy are expected to more accurately follow the
species tree, due to their lower effective population size [17]. However, the same reasons are
currently causing scientific debate around the question of whether mitochondrial genes have a
disproportionate effect on species-tree inference with mixed (nuclear and mitochondrial), multi-
locus datasets. A recent study conducted by Jockusch et al. [18] on slender salamanders found a bias
particularly in Bayesian tree inference methods toward loci with higher variability. They found that
the addition of mitochondrial sequences to multi-locus nuclear datasets forced the species tree
inference to disproportionally gravitate toward the more informative mitochondrial gene tree when
analyzed in a multispecies coalescent framework. A similar bias of mitochondrial genes has been
shown in other studies [19]. Yet, there are also various studies that explicitly test and find no
evidence for a disproportional influence of mitochondrial genes on specie-tree inference [5], [20],
[21].
Setting the mitochondrial effective population size to one quarter that of nuclear genes has been
argued to not be accurate in all cases, since this is based on the assumption of only one
mitochondrial copy being transmitted per egg-cell and it further assumes equal gender ratios. If
these assumptions are not met, especially if the gender ratio differs significantly from being equal,
the effective population size of mitochondria may even exceed that of nuclear loci [6]. Therefore, the
opinion that the mitochondrial tree will be more likely to follow the species tree more closely than
nuclear loci is not shared among all scientists [6], [18].
UCEs as a novel source of genetic data
The aforementioned concerns about discordance of single genealogies and particular caveats
surrounding mitochondrial sequences point out the importance of sampling a sufficient number of
unlinked genetic markers in order to accurately estimate the species tree. One novel approach is the
generation of Ultraconserved Element (UCE) sequences, which are distributed across the complete
nuclear genome and provide a massive multilocus dataset of unlinked nuclear loci.
UCEs are not classified by their function, but solely by the fact that they are highly conserved
across a wide range of animal taxa. In fact, for many of these sequences the function is unknown but
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
4
many of them are thought to be involved in essential processes during ontogenesis [22] and are
involved in gene regulation [23]. Many such highly conserved sequences have been identified,
distributed across the whole genome [22], [24], [25]. The highly conserved nature of these regions
makes them adequate candidates for standard multilocus sequence capture kits, which do not have
to be specifically designed for the taxon group of interest, but are broadly applicable. There are
selected UCE probe sets for sequence capture available, specific for different, broadly defined
organism groups (e.g. amniotes, fish, etc.), which contain several thousands of unlinked, highly
conserved loci (http://ultraconserved.org, last accessed April 20, 2015). The target loci of such UCE
kits have been selected to match a certain profile; they have to consist of a highly conserved core-
region of 100-200 bp length flanked in direct proximity by more variable regions. These flanking
regions have to be located closely enough to the core region to be captured on the same fragment as
the conserved target sequence during the sequence capture process [26]. Several recent studies
have shown the effectiveness of UCE datasets for estimating population divergence times [27] and
phylogenetic tree inference on shallow [28], as well as deep evolutionary times [29].
SNPs
Another type of genetic data that is becoming more popular in phylogenetic studies are Single
Nucleotide Polymorphisms (SNPs). The increasing feasibility of sequencing vast numbers of unlinked
genetic loci across the complete genome (such as e.g. UCE data), provides a good basis for the
generation of genome-wide SNP data. These data are usually generated by extracting single
polymorphic sites from unlinked genetic loci (one site per locus), then adding these extracted sites
into one joined alignment. Most commonly, only biallelic polymorphisms are extracted, meaning
those sites that show variation between only two separate nucleotides. This leads to a dataset with
maximum informativeness, containing only sites that are variable between the targeted taxa. The
recent development of Bayesian methods for inference of species trees from biallelic character sets
such as SNPs [30] has made the use of SNP data more accessible and attractive for phylogenetic
studies. Additionally, SNP data can be used for a range of methods developed in the field of
population genetics in order to examine ancient admixture patterns and genetic introgression [31],
[32] to name only a few applications. Within recent phylogenetic avian studies, SNPs were proven to
be a very useful and powerful tool to examine fine-scale population patterns within bird
communities [28], [33], [34].
Topaza
The genus Topaza contains some of the most spectacular and largest hummingbirds worldwide,
measuring up to 23 cm (for adult males, including tail feathers) and weighing up to 12 g [35], [36].
Introduction
5
Characteristic of males within this genus are the two long, purplish black tail feathers, which can
reach lengths of up to 12 cm. Adult males and females of this genus are strongly dimorphic. The
plumage of adult male Topaza hummingbirds shows a characteristic metallic shine, particularly on
the yellow-green throat (gorget) and the green undertail coverts. Chest and breast feathers are
colored reddish-grey whereas the whole bird seen from a distance appears red, which is gradually
decreasing in intensity from the head toward the tail, turning into a yellow green at the rump and tail
coverts. The adult female birds appear in a more coherent green, with a metallic shining, orange
throat. These birds are usually found in the canopy along forest edges and clearings, and are often
seen close to river banks [37].
The number of species distinguished within this genus has been the subject of taxonomic
discussion; based on uncertain morphological evidence, some scientists refer to Topaza as a
monotypic genus [37], [38], whereas others distinguish two separate species [39], [40]. The current
consensus among ornithologists commonly distinguishes two separate species within this genus - the
Crimson Topaz (Topaza pella) [36] and the Fiery Topaz (Topaza pyra) [35] - which both occur in
northern South America (Figure 2). The two species do not occur sympatrically except for a very small
range along the Rio Negro. There are multiple conflicting hypotheses concerning subspecies
assignments within these species, which have been discussed in the scientific literature over the last
decades, based on morphological characters [38], [40]. At this point, no genetic study known to the
authors has explored the genetic structure within Topaza species and has addressed these
subspecies assignments on a molecular level.
Aims of this study
Here we use a variety of multilocus datasets to explore the genetic substructure within the
currently recognized Topaza species (T. pella and T. pyra). We discuss the genetic validity of previous
subspecies assignments and the effect of the Amazon River as a possible dispersal barrier for Topaza
hummingbirds. Further, we investigate an apparent discrepancy of the mitochondrial tree with the
species tree, and explore how this discrepancy influences the inference of species trees when
mitochondrial sequences are added to a multilocus nuclear dataset. Finally, we aim to evaluate the
utility of different datasets for the inference of genetic structure below the species level. We are
specifically addressing the following questions:
1. Which is the correctly inferred phylogenetic relationship between the sampled individuals
and which samples belong to the same population?
2. Do we see a genetic separation of populations concordant with the course of the Amazon
River?
3. Do these inferred populations match previously described morphological subspecies?
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
6
4. How does the mitochondrial tree match the inferred species tree?
5. How does the addition of the complete mitochondrial genome to a multilocus nuclear
dataset influence species tree inference?
6. How suitable are the applied datasets and methods to explore genetic substructure below
the species level?
In order to address these questions, we generate novel data from thousands of UCE loci and
further sequence a set of previously characterized nuclear gene loci [41]–[44], using a sequence
capture approach and Illumina® Next-Generation sequencing. In order to further improve the
informativeness of this dataset for examining shallow evolutionary timescales, we extract SNP data
from the large number of UCE-loci. Additionally, the sequence capture approach provides sufficient
coverage of mitochondrial sequences which we use in this study to assemble the entire
mitochondrial genome for each sample. We use a variety of phylogenetic methods [30], [45]–[48] in
order to explore the extensive genetic information lying within these mitochondrial and nuclear
sequence data. Each dataset is used separately for species tree inference, and consistent
phylogenetic patterns are evaluated and further explored with Bayesian species delimitation
methods [48]. Furthermore, we use population genetic methods [31] to explore the admixture
patterns among the sampled individuals and relate these results to the inferred phylogenetic
relationships. We find a well-supported discrepancy of the mitochondrial tree with the species tree,
which is consistently present in all inferred species trees. We test how this discrepancy influences
species delimitation and species tree inference when the mitochondrial sequence is added to a
multilocus nuclear dataset.
Methods
Taxon sampling
The individuals for this study were sampled with the goal to cover the maximum extent of the
Topaza distribution. Additionally, in order to test for a possible dispersal barrier effect of the Amazon
River, we sampled individuals from both sides (north and south) of the river. This resulted in a total
of 4 samples for T. pyra (2 north, 2 south) and 5 samples for T. pella (2 north, 3 south, see Figure 2).
The distribution range of Topaza species was modeled with the R package SpeciesGeoCoder [49],
based on occurrence data available in the eBird database [50]. Further, we included one sample of
the phylogenetically closest sister genus Florisuga (F. fusca) as an outgroup taxon. All samples were
ordered from museum specimen, with skins being available for further reference (Table 1).
Methods
7
Table 1: Voucher information for sampled taxa. The column ‘Taxon’ refers to the current species assignments.
ID Taxon Voucher code Institution
1 T. pyra INPAA1106 Instituto Nacional de Pesquisas da Amazônia 2 T. pyra MPEG62475 Museum Paraense Emílio Goeldi 3 T. pyra MPEG62474 Museum Paraense Emílio Goeldi 4 T. pyra MPEG52721 Museum Paraense Emílio Goeldi 5 T. pella USNM586322 National Museum of Natural History,
Smithsonian Institution, Washington DC, USA 6 T. pella INPAA3319 Instituto Nacional de Pesquisas da Amazônia 7 T. pella MPEG61688 Museum Paraense Emílio Goeldi 8 T. pella MPEG65603 Museum Paraense Emílio Goeldi 9 T. pella INPAA6233 Instituto Nacional de Pesquisas da Amazônia 10 Florisuga fusca MPEG70697 Museum Paraense Emílio Goeldi
Figure 2: Distribution map of T. pyra (green) and T. pella (red) and collection location (numbered symbols) of the
Topaza samples. Distribution ranges of Topaza species were modeled with the R-package speciesgeocodeR, based on
available eBird occurrence data. All occurrence points used for the distribution range modeling are plotted as
transparent symbols. Shown in light blue are the courses of the main rivers in the Amazon basin, namely the Amazon
River (horizontal axis), joined by the Rio Negro from the north and the Rio Madeira from the South.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
8
Next-Generation Sequencing
DNA extraction and library preparation
DNA of all samples was extracted from muscle tissue using the Quiagen DNeasy Blood and Tissue
Kit according to the manufacturer’s instructions (Quiagen GmbH, Hilden, Germany). Before library
preparation, all samples were sonicated with a Covaris S220 sonication device in order to break the
genomic DNA into shorter fragments. The settings were chosen to break the DNA into fragments of
approx. 800 bp. This fragment length is the maximum recommended for sequence capture and was
chosen in order to capture as much of the variable flanking regions of the UCE loci as possible. Paired
end, size selected DNA libraries were prepared for sequencing on an Illumina® platform using the
magnetic bead based NEXTflexTM Rapid DNA-Seq Kit (Catalog #: 5144-02, Bioo Scientific Corporation,
Austin, TX, USA) following the enclosed manual (version 14.02), containing the following steps:
End-Repair and Adenylation: In this step, sticky ends are removed from the double-stranded DNA
fragments and an Adenin is ligated on the end of each strand which is necessary for the adapter
ligation in the next step.
Adapter Ligation: We used barcode adapters from the NEXTflexTM DNA Barcodes 48 kit (Catalog #:
514104) which were ligated to the double-stranded DNA fragments.
Magnetic Bead Size Selection: We selected fragments of 650-800 bp length using Magnetic Beads
(Agencourt AMPure XP), including several washing steps to purify the DNA in the final sample-
solution.
PCR Amplification & Purification: The final size selected, cleaned DNA samples were amplified per
PCR (15 cycles) using the NEXTflex primer set provided in the NEXTflexTM DNA Barcode kit. The PCR
product was purified with the QIAquick® PCR Purification kit (QIAGEN group), following the
manufacturers manual, but using only 30 µL of elution buffer (instead of the recommended 50µL) for
the final elution, in order to retrieve a higher concentration of the final DNA library.
Probe design
Ultraconserved elements:
The sequence capture probe library consisted of a set of 2,560 probes targeting 2,386
Ultraconserved Elements (Tetrapods-UCE-2.5Kv1), as first described by Faircloth et al. [26]. We used
probes of 120 bp length; sequences were downloaded from http://ultraconserved.org (last accessed
April 20, 2015). The majority of UCE loci are targeted with only one single probe per locus. Given the
size-selected, approximately 800 bp long fragments and the probe sequences of 120 bp length, one
can expect to receive up to 680 bp of flanking region on each side of the target sequence. The UCE
probe set used for this project is designed for tetrapods, and can therefore be applied to a broad
Methods
9
range of animals, including amphibians, reptiles, birds and mammals. The selected loci are
distributed across the complete genome and are genetically unlinked.
Nuclear genetic markers:
We further designed probes for capturing 10 nuclear genetic markers, commonly used in bird
phylogenetics, namely the genes coding for:
1. Beta fibrinogen, exons 7+8 (Bfib)
2. Eukaryotic translation elongation factor 2, exons 5-9 (EEF2)
3. Early growth response 1, exon 2 (EGR1)
4. Fibrinogen beta chain, exon 5 (FGB)
5. Myoglobin, exon 2 (MB)
6. Ornithine decarboxylase, exons 6-8 (ODC)
7. Recombination activating protein 1 (RAG1)
8. Transforming growth factor beta 2, exons 5+6 (TGFB2)
9. Zinc finger protein, exon 2 (ZENK2)
10. Zinc finger protein, 3‘ UTR (ZENK3)
For creating target-specific probes (length 120 bp) covering these loci, we used a 30 bp tiling
design (new probe starting every 30 bp of the target sequence), resulting in 4-fold probe coverage of
each locus. Probes were designed based on available reference sequences for these loci of closely
related taxa, obtained from NCBI GenBank (see Table 2 for accession-no. and locus information).
Mitochondrion:
Due to the high copy number of mitochondrial genomes in muscle cells in particular, a very large
number of fragments covering the mitochondrial genome is found in the final target selected
sequence mix, even if no probes associated to the mitochondrial genome are used for sequence
capture. In this study we had no probes targeting mitochondrial sequences, yet we were able to
assemble the complete mitochondrial genome for all samples (see Table 2 for information about
mitochondrial coverage).
Sequence enrichment and sequencing
The sequence enrichment was performed using a sequence capture MYbaits kit according to the
enclosed user manual (V. 1.3.8). The target-specific probes were mixed with the hybridization buffers
and the DNA library and incubated for hybridization for 38 hours at 65°C. During this hybridization
period, the biotinylated baits bind to their specific target regions. In the next step, magnetic
Streptavidin beads are applied which have a high affinity to Biotin. The biotinylated probes, which
have hybridized to the target DNA region bind to the Streptavidin beads. These beads are then
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
10
fixated with a magnet stand, and the supernatant, containing the non-target DNA, is discarded. After
several washing steps, the target DNA then was eluted form the beads and transferred into a fresh
tube. We then desalted the sequence capture product with a QIAquick® PCR purification column
(QIAGEN group) following the manufacturers manual, but only using 20 µL elution buffer for the final
elution, in order to retrieve a more concentrated DNA solution. After this we ran another PCR (14
cycles) to amplify the final product for all samples. After this final PCR, all samples were pooled into
an equimolar mix, with a total DNA content of 689 ng double-stranded, barcoded DNA with an
average fragment length of 632 bp.
The final product was sent in for sequencing to the Sahlgrenska Genomics Care Facility in
Gothenburg, Sweden. Sequencing was performed with a one lane, 250 bp paired-end Illumina®
MiSeq run (Illumina Inc. San Diego, CA, USA).
Data processing
UCE assembly:
We used the PHYLUCE software package (https://github.com/faircloth-lab/phyluce last accessed
April 20, 2015) for reviewing and assembling the sequenced UCE-loci. All programs and scripts
mentioned in the following are integrated in the PHYLUCE package. A more precise documentation of
the complete workflow described here can be found at https://github.com/tobiashofmann88/UCE-
data-management/wiki.
We used “illumiprocessor” to trim all reads of adapter contamination and sort out reads with low
quality scores or ambiguous bases. The trimmed reads were then assembled into contigs using
“assemblo_abyss.py”. Contigs are clusters of reads that are covering the same region (see Figure 3).
The consensus sequences of all assembled contigs are printed into one fasta-file, resulting in a file
with >100.000 separate contig consensus sequences (in the following, simply referred to as contigs)
with each sequence carrying an individual ID. All of these contigs were mapped against the UCE
sequences from the probe order file with “match_contigs_to_probes.py” in order to find those
sequences which represent UCE-loci that were selected and amplified during the sequence capture
process. This program prints the results of the mapping process into a SQL database; more
specifically, it prints the information containing which UCE loci could be found in which sample, and
the corresponding contig IDs. Given this information, we extracted all those UCE-loci from the contig-
fasta-file that were present in all sampled taxa, using “get_fastas_from_match_counts.py”. The
extracted sequences were aligned among all samples for each locus using MAFFT as implemented in
the PHYLUCE software package (“seqcap_align_2.py”).
Methods
11
Figure 3: Assembling of reads into contigs. Reads can be assembled into contigs by either mapping them against a
reference sequence (gene of interest), as in this example, or they can be assembled relative to each other without the
use of a reference sequence. Such algorithms performing the latter find overlapping regions of single reads and use these
matching reads to create a growing consensus sequence, until they reach a minimum threshold of read coverage on
either side of the contig. Contigs can consist of assemblies of only a handful of reads or can span over big genomic
regions (e.g. the complete mitochondrial genome), entailing 100,000s of reads. The vertical extent of a contig is referred
to as read-depth, which is a measure of how reliably certain regions are covered.
Mapping and phasing of nuclear genes
Sequences of nuclear genes were assembled using the CLC Workbench software (CLC-
AssemblyCell, version 4.3.0, CLC Bio-Qiagen, Aarhus, Denmark). The adapter- and quality-trimmed
reads from the illumiprocessor processing (see ‘UCE assembly’) were mapped against the reference
sequence for each gene (same sequences used for probe design, Table 2 ), using “clc_mapper”. After
converting the resulting cas-assembly-files into bam-format (with the program “clc_cas_to_sam”),
we used samtools, version 0.1.19 [51] to sort the bam-files (“sort”) and create bam-index-files
(“index”) in order to view the assemblies in Tablet [52]. Assemblies were controlled by eye for
contamination with low quality reads and duplicate reads. The CLC-AssemblyCell software package
contains software options for quality trimming (“clc_quality_trim”) and removal of duplicates
(“clc_remove_duplicates”) which can be applied to improve assemblies if they show the above
mentioned contaminations.
The final bam-assemblies were phased with samtools (“phase”) in order to sort the reads from
the assembly into two separate alleles, if present. The consensus sequence of the resulting phased
assemblies was created with a combination of the samtools “mpileup” command, bcftools and
vcfutils.pl, as suggested in the samtools manual (http://samtools.sourceforge.net/samtools.shtml
last accessed May 17, 2015). The final consensus sequences were checked for the absence of
ambiguous sites and were further controlled for correct phasing by examining the equivalent bam-
assembly-files to each sequence. The mentioned commands are part of the samtools software
package, which is freely available at https://github.com/samtools (last accessed May 17, 2015). An
automated workflow of the above-described steps of assembling and phasing gene loci with
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
12
illumprocessor trimmed reads is available at https://github.com/tobiashofmann88/Processing-
Illumina-reads.
Alignments for each locus were created using the MAFFT multiple alignment builder plugin
(version 1.3) in Geneious, version 6.1.8 [53], using the default settings. If two alleles were present at
a given locus, both were included into the alignment. Final alignments were controlled by eye for
alignment errors and were exported into nexus-format.
Mitochondrial genome assembly
For the assembly of the mitochondrial genome we used the same trimmed-read files as in the
previously described assemblies. First we ran “clc_assembler” (part of CLC-AssemblyCell) in order to
assemble the reads into contigs (see Figure 3). The program prints the consensus sequence of each
contig that could be assembled into one joint fasta-file. We then mapped all reads from the trimmed-
read files against these assembled contig consensus sequences (“clc_mapper”) in order to receive
information about the read coverage of each contig (“clc_sequence_info”). In the next step, we
created a blast-database from the contig-fasta-files, using the command “makeblastdb” from the
blast+ software package [54]. We downloaded the taxonomically closest available mitochondrial
genome sequence (Trochilidae: Amazilia versicolor, see Table 2) from NCBI and blasted this sequence
against the previously created contig database. Blasting was done using the command “tblastx”,
which translates the nucleotide sequences into amino acid sequences before matching it to the
database, which makes the blast search less conservative, and results in more matches. All hits from
the contig-blast-database were printed into an xml-file, which was reviewed using ngKlast, version
4.5 [55]. The longest match was inspected, checking the extent of coverage of the reference
mitochondrial genome. In all cases, the longest matching contig was covering the complete
mitochondrial genome and was therefore extracted from the contig-fasta-file. We provide an
automated workflow of the above described steps for assembling the mitochondrial genome at
https://github.com/tobiashofmann88/assembling-complete-mt-genome.
The extracted longest contigs, representing the mitochondrial genome, were aligned with the
reference mitochondrion of A. versicolor for all samples, using the MAFFT online alignment software
(version 7, http://mafft.cbrc.jp/alignment/server/ last accessed May 17, 2015). All sequences were
oriented in the same direction and edited to start at the same position (according to the reference
sequence). The separated sequences were then annotated using DOGMA [56]. DOGMA blasts
(“tblastx”) the input nucleotide sequence in all six reading frames (both reading directions and each
of the three possible codon positions) against an amino acid sequence database of each
mitochondrial sequence element (mRNA, rRNA and tRNA coding sequences). The database is located
on the DOGMA server and contains a multitude of mitochondrial sequences across all animal groups.
As a result, the user receives a list of the identified coding regions and the respective names and
Methods
13
positions (in bp) of these regions on the input sequence. We plotted the resulting annotations with
GenomeVx [57] to create a circular map of the mitochondrial genome (e.g. Figure 4).
Mitochondrial genome sequences of all taxa were realigned and annotations were checked and,
if necessary, synchronized across the alignment, using the bioedit alignment editor [58]. The
sequence alignment for each annotated coding element was extracted separately in fasta-format
using Geneious, version 6.1.8 [53]. The amino acid sequence for the extracted sequence alignments
of the 13 mRNA coding genes were examined in bioedit for alignment errors leading to reading frame
shifts.
SNP-datasets
Each UCE-locus-alignment that could be assembled for all taxa was scanned for sites that were
biallelic polymorphic within the Topaza samples and did not contain missing data. Among these
polymorphic sites, one single nucleotide polymorphism (SNP) was randomly chosen per locus and
coded into binary format (0 or 1) into a joint alignment file. This resulted in a set of 570 SNPs.
Additional SNP datasets were extracted, specifically aiming for variation within the currently
recognized species, containing only sites that were found biallelic polymorphic within T. pella (621
SNPs) and T. pyra (524 SNPs) respectively. All the above steps were performed using a customized
script, provided by Yann Bertrand (Department of Biological and Environmental Sciences,
Gothenburg University, Sweden).
Mitochondrial tree
Former studies have found considerable differences in substitution rates between the different
regions across the mitochondrial genome [18], [59], [60]. In order to apply the most suitable
parameters in both terms of substitution rate and substitution model, we partitioned the data in 15
partitions, including a separate partition for each protein-coding locus (13), one partition (1)
including all concatenated 22 tRNA-coding sequences and another partition (1) containing both rRNA
coding sequences (12S and 16S ribosomal subunit). Substitution models and clock models were
unlinked for all 15 partitions. The most suitable substitution model for each partition according to
the Bayesian Information Criterion (BIC) was determined with jModeltest [61]. We excluded the
control region (misc_feature) of the mitochondrial genome, which is located in between the coding
regions for ND6 and the 12S ribosomal subunit (see Figure 4), since this region contains too highly
variable regions which caused difficulties properly aligning these sequences. Since the mitochondrion
is inherited as a single unit, all partitions are expected to follow the same gene tree, given that no
recombination has taken place. We therefore conducted a recombination test using RDP, version
3.44 [62] on the alignment containing the complete mitochondrial genome sequences. The three
methods RDP [63], MaxChi [64] and Chimaera [65] were applied, setting the p-value of 0.1, in order
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
14
to screen the alignment for possible recombinant elements. We found no indications for
recombination events across the alignment, and therefore linked the trees of all partitions.
In order to retrieve a dated phylogeny of Topaza, we used substitution rate priors of the
mitochondrial genes ND2, ND4 and the tRNA-partition, estimated for honeycreapers by Lerner et al.
[66]. These rate-priors were defined as normal distributions scaled in mutations/site/Ma with equal
rates for ND2 and ND4 (mean = 0.0219, SD = 0.0015) and a slower rate for the tRNA partition (mean
= 0.005, SD = 0.00207). We further implemented clade-age priors for the split between Topaza and
its sister genus Florisuga (mean=18.84, SD=1.6 Ma) and the split between T. pyra and T. pella
(mean=3.01, SD=0.4 Ma), which were estimated by McGuire et al. [67] as part of the species-tree of
the complete hummingbird family (Trochilidae) based on the above-mentioned substitution rate
priors and on island-age, as well as fossil calibrations for outgroups of the hummingbird family. We
tested different combinations of the aforementioned dating priors in order to check how these priors
influence each other.
We used BEAUti, version 1.8.0 [45] to set up an xml-file with the above described priors
concerning partitioning and dating. We assigned a log-normal relaxed clock to each partition and
chose a Yule process speciation tree prior [68]. The MCMC chain was set for 100 million generations
and trees and logging information printed every 10,000 generations, using BEAST, version 1.8 [45].
After initial issues with convergence of the MCMC chain (see Results), we set the base frequencies
for all partitions to ‘empirical’ and restricted the uncorrelated lognormal relaxed clock (ucld) mean
values from the very broad default to a more realistic range (mRNA coding loci: uniform, initial=0.02,
upper=0.2, lower=0.002; tRNAs and rRNAs: initial=0.005, upper=0.05, lower=0.0005). After checking
MCMC runs for proper convergence with Tracer, version 1.6 [69], we summarized the posterior tree
distribution into the maximum clade credibility tree using TreeAnnotator, version 1.8 [45], discarding
the first 1,000 trees (10%) as burn-in.
Species tree
Nuclear dataset
We estimated the species tree by analyzing the 10 nuclear gene loci in *BEAST [46]. Substitution
models, clock models and trees were unlinked for all loci. In order to avoid over-parameterization of
the xml-file, we kept each gene sequence as one partition, without sub-partitioning it by codon
position. Separate alleles and homozygous sequences within the alignments belonging to the same
sample were given the same trait value, thereby assigning each individual a separate taxon.
Substitution models for each gene were determined with jModeltest according to the Bayesian
Information Criterion (BIC). Initial issues with the convergence of the MCMC led to the exclusion of
Methods
15
EGR1 and ZENK2 from further analyses (see Results for reasoning). We applied the same clade-age
priors as for the mitochondrial gene tree (see above) and set the substitution rates of Bfib and ODC
according to Lerner et al. [66], which were defined as normal distributions scaled in
substitutions/site/Ma. The substitution rate for Bfib was set to 0.0019, 0.0003 (mean, SD) and for
ODC to 0.0015, 0.000237. A lognormal relaxed clock was applied to each locus and a Birth-Death
prior [68] for the species tree. Base frequencies were set to ‘empirical’ and the ucld mean was set to
a more restrictive, yet realistic range for each locus (initital=0.002, upper=0.004, lower=0.00002). The
MCMC was set for 100 million generations and states, and trees were logged every 10,000
generations. After checking the MCMC for convergence, the maximum clade credibility tree was
inferred with 9,000 trees of the posterior tree distribution (burn-in 1,000).
Mixed dataset
Another xml file was set up containing the eight nuclear loci with the exact settings as above,
combined with all mitochondrial loci (mixed dataset). Mitochondrial sequences were loaded into
BEAUti and the same settings were applied as described for the tree inference in BEAST for the
mitochondrion (15 partitions, unlinked substitution models and clock models, linked trees). The
ploidy type was set to ‘mitochondrial’ and the specific substitution rates for ND2, ND4 and the tRNAs
were applied additionally to the nuclear substitution rates for Bfib and ODC and the above-described
clade priors. The MCMC was run with the same settings as in the previous runs and analyzed in the
same manner.
DISSECT
In order to run species delimitation analyses in DISSECT [48], the *BEAST xml-files from the two
analyses described above (nuclear dataset and mixed dataset) were translated into DISSECT xml files.
Therefore, the Birth-Death species tree prior was replaced with the Birth-Death-Collapse model, as
described in the DISSECT user manual (last updated February 17, 2014). Parameter values for ε
(collapsing height) and w (collapsing weight) were left at default. All other settings were left identical
to the previous *BEAST runs. The xml-file was executed using the DISSECT-modified BEAST 1.8.0
version (“beast-dissect.sh”). The resulting log-file was checked for convergence, and the maximum
clade credibility tree was calculated from 9,000 trees of the posterior distribution, discarding the first
1,000 trees as burn-in.
We used “SpeciesDelimitationAnalyser”, which is a DISSECT tool that collapses nodes of small
height and exports a data table, listing the clusters that were found. After examining the log-files and
checking for convergence and effective sample size (ESS) values greater than 200, the burn-in was set
to 10 %. The values for collapse height and the similarity threshold for joining two clusters (simcutoff)
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
16
were left at default. Similarity matrices were visualized using the R code provided in the DISSECT user
manual.
After examination of the DISSECT results, we grouped the samples according to the found
clusters, possibly representing distinct populations with only limited gene-flow among each other.
We set up another xml file for the nuclear, as well as the mixed dataset, assigning a joint trait value
to each sample sharing the same cluster in order to infer the species tree with *BEAST and examine
the effect of the addition of the mitochondrial genome on the species tree inference. The trait value
assignments for the samples were as follows: samples 1-4 to “T. pyra”, samples 5 and 6 to “T. pella
north”, sample 7 to “T. pella intermediate”, sample 8 and 9 to “T. pella south”. Other settings and
priors were identical, as described before for the nuclear and mixed dataset.
UCEs
Separate gene trees were created for each UCE-alignment with PhyML [70], using the parallelized
implementation in CloudForest (https://github.com/ngcrawford/CloudForest last accessed May 19,
2015). The resulting, unrooted trees were printed in Newick-format into one cumulative tree file. In
order to receive a measure of node-support in the final species tree, we generated 1000 non-
parametric bootstrap replications of the UCE dataset by resampling nucleotides within the UCE-
alignments, as well as resampling UCE-loci within the data set [71], using CloudForest. All trees were
rooted using the “RerootTree” function on the STRAW server [72] by setting sample 10 (Florisuga
fusca) as outgroup. We used MP-EST [47] to infer the species tree, which estimates the most likely
species tree given a set of gene tree topologies. For the bootstrap dataset, we ran MP-EST separately
for each bootstrap replicate tree-set. The resulting set of 1000 bootstrap species trees was
summarized to one maximum clade credibility tree with TreeAnnotator, version 1.8 [45]. The
resulting node values represent bootstrap support of the respective clade.
Since many of the UCE loci showed little to no variation among the Topaza samples, we extracted
a subset containing only the most informative loci. Only those loci were selected which contained
more than 20 polymorphic sites across the alignment. We created 1000 bootstrap replicates of this
reduced dataset in the same manner as before for the complete dataset, and analyzed the rooted
gene trees in MP-EST. Two separate MP-EST analyses were conducted, one with every sample being
assigned a separate label in the species tree, and another one with the cluster assignments resulting
from the DISSECT analysis.
SNPs
The binary SNP alignment, consisting of 570 unlinked polymorphic sites, was formatted for
analysis in SNAPP [30]. SNAPP is a MCMC based species-tree and species-demographics inference
program that uses unlinked biallelic markers (such as SNPs) as input. We used BEAUti 2, version 2.2.1
Methods
17
[73] to set up the xml file for species tree inference. BEAUti 2 contains the option to download
additional packages in order to set up a customized xml file for different implementations in BEAST 2
[73]. Coalescent rate and mutation rates (forward mutation rate “U” (0 to 1) and backward mutation
rate “V” (1 to 0)) were set to be estimated by SNAPP based on the input data. The Yule species-tree
prior parameter λ, which sets the rate at which species diverge, was left at default (0.00765). The
MCMC was set to 10,000,000 generations and trees and other parameters were logged every 1,000
generations. Two separate SNAPP analyses were launched, one in which each sample was assigned
its own clade, and another one with the clade assignments resulting from the DISSECT species
delimitation analysis.
Population structure
In order to explore the genetic structuring within the species boundaries of the two currently
recognized Topaza species (T. pyra and T. pella), we conducted population structure analyses based
on the SNP datasets that were extracted separately for each species (621 SNPs for T. pella and 524
SNPs for T. pyra). We used the program STRUCTURE, version 2.3.4 which was first described by
Pritchard et al. [31]. STRUCTURE is based on a Bayesian MCMC algorithm which explores genetic
clusters (populations) within a given dataset and assigns individuals to these inferred populations.
The number of clusters (k) to be explored is set by the user, and STRUCTURE assigns individuals in
random combinations to these clusters in order to find the best fit of the variation pattern. We
explored k values from 1 to 3 within both Topaza species. The ploidy level of the data was set to 1,
since we were using an effectively haploid SNP dataset, which was extracted from the consensus
sequences of assembled contigs, not containing biallelic information within a sample. Lambda (λ), a
quantitative measure of independence between markers, was chosen to be inferred by STRUCTURE
based on the data. There are two separate ancestry models available in STRUCTURE, the ‘no
admixture’ and the ‘admixture’ model. ‘No admixture’ would imply the assumption that the
ancestors of inferred populations were belonging to completely discrete populations themselves. We
therefore chose the ‘admixture’ model, since we have strong reason to assume admixture in the
ancestral populations of now putatively separate populations. This assumption is based on the
species tree inference results, which show shallow evolutionary times of all splits between samples
assigned to the same species, indicating relatively recent admixture within the species boundaries.
The first 10,000 generations of the MCMC were discarded as burn-in, and the chain was set to run for
an additional 100,000 generations after burn-in. The distribution of posterior likelihood estimates,
and the estimation of the data-probability under the chosen k value were checked for convergence.
A separate STRUCTURE analysis was run for each of the two Topaza species.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
18
Results
Data exploration
Mitochondrion
Despite the fact that no probes were used during sequence-capture that were targeted toward
mitochondrial sequences, we received a very deep-read coverage for the mitochondrial genome. In
fact, in many cases the average coverage per base pair was much higher for the mitochondrion than
for the selected nuclear loci that we selected during sequence capture (Table 2). Between the
different samples, 1-12 % of sequenced reads were of mitochondrial origin (Table S3).
The complete mitochondrial genome could be assembled for all 10 samples in this study. We
found no gene duplications or tandem repeats of mitochondrial regions which have been reported to
occur on the mitochondrial genome in other bird taxa [2]. The assembled genomes were of varying
length, ranging from 16,762 to 16,862 bp (Table S3). This variation of length in the mitochondrial
genomes is mainly attributable to the very variable end of the control region (misc. feature), which
presents challenges for assembly due to many tandem repeats of microsatellite elements,
consequentially causing difficulties for the alignments of these variable reasons, even among closely
related taxa. The control region and the intergenic spaces were discarded from subsequent analyses,
leaving a total alignment of 15,428 bp length for phylogenetic analyses, which was free of missing
data. Figure 4 shows the position of the identified regions on the mitochondrial genome, exemplarily
for sample 2 (T. pyra2). For more information on sequence length and exact positions of all identified
coding regions, see Table S5.
Nuclear loci
All ten nuclear genes that were targeted in the sequence capture enrichment could be recovered
in their entirety for all samples with extensive read coverage (Table 2), adding up in total to 10,201
bp of nuclear DNA sequence for each sample. In general, the recovered nuclear loci showed little
variation within the genus Topaza (see Table S4), due to relatively shallow evolutionary times of the
deepest splits of lineages within this genus (< 3 Ma), according to prior information [67]. We decided
to exclude loci from further phylogenetic analyses that showed less than 1 % variable sites within the
alignment, which led to the exclusion of EGR1 (0 %) and ZENK2 (0.2 %). This left 8,404 bp of nuclear
sequence information for further analyses.
Results
19
Figure 4: Circular map of the mitochondrial genome of T. pyra2. The inner ring shows the scale in kb (kilo base pairs).
The section between position 15,558 and the end (position 16,762), here marked as a black box, is commonly referred to
as miscellaneous feature. This region contains sequences which function as control region for replication and
transcription of the circular mitochondrial genome. Protein-coding genes are marked as colored blocks, color-coded to
indicate gene families. Marked in dark brown are rRNA coding sequences (rrnS = small ribosomal subunit (12S), rrnL =
large ribosomal subunit (16S)) and in yellow the tRNA coding sequences.
Ultraconserved Elements (UCEs)
We assembled a set of 824 UCE-loci that were present in all 10 samples. The length of the
assembled UCE alignments ranged from 223 bp to 1130 bp (mean = 870 bp, stdev = 150 bp, see
Figure 5). As expected, the central regions of the UCE alignments showed little to no variation among
the different samples (Figure 6). These regions represent the highly conserved core regions of the
UCE loci that were targeted by the sequence capture probes. The further the distance from the
conserved core region, the more variation could be found within the alignments (Figure 6). A subset
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
20
of 73 UCE-loci was extracted where each contained more than 20 variable sites among Topaza
samples, which was used for further analyses, containing in total 68,997 bp of sequence alignment.
Table 2: Locus information and read coverage for 10 nuclear loci and the mitochondrial genome. The first block
contains information about the reference sequences used for probe design (nuclear loci only) and for the assembly of
reads (nuclear loci and mitochondrion). The information displayed for each sample is the total number of reads (# reads)
and the average coverage per base pair (Ø coverage/bp) for each locus, extracted from the bam-assembly-files, viewed
with Tablet [52].
Locus Reference sequence T. pyra1 T. pyra2
organism acc# NCBI length (bp) # reads Ø coverage/bp # reads Ø coverage/bp
Bfib Topaza pella GU167142.1 1,076 785 167 4,880 1,045
EEF2 Phaethornis griseogularis EU738666.1 1,619 774 101 5,017 667
EGR1 Phaethornis griseogularis EU738996.1 609 450 144 2,736 938
FGB Phaethornis griseogularis EU739148.1 660 241 79 1,874 634
MB Phaethornis griseogularis EU740011.1 718 380 119 2,143 661
ODC Topaza pella GU167086.1 618 412 144 2,541 903
RAG1 Phaethornis bourcieri JN558646.1 2,639 1,557 134 10,164 901
TGFB2 Phaethornis griseogularis EU737426.1 571 207 71 1,545 556
ZENK2 Eutoxeres aquila AF492503.1 1,188 829 145 5,366 1,004
ZENK3 Eutoxeres aquila AF492533.1 503 330 138 2,136 881
Mitochondrion Amazilia versicolor NC_024156.1 16,861 63,816 823 154,537 2,044
Locus T. pyra3 T. pyra4 T. pella5 T. pella6
# reads Ø coverage/bp # reads Ø coverage/bp # reads Ø coverage/bp # reads Ø coverage/bp
Bfib 778 164 1,279 273 874 184 1,914 401
EEF2 713 90 1,693 224 885 112 2,339 293
EGR1 383 122 892 297 634 200 1,359 433
FGB 336 110 566 187 312 102 656 208
MB 320 95 627 192 386 116 1,041 309
ODC 406 140 697 245 545 186 1,164 402
RAG1 1,587 135 3,168 280 1,750 147 4,714 395
TGFB2 185 63 383 136 263 94 639 221
ZENK2 795 137 1,650 298 1,110 192 2,538 435
ZENK3 332 130 747 308 354 145 1,000 408
Mitochondrion 38,572 490 16,146 211 72,207 899 164,804 1,964
Locus T. pella7 T. pella8 T. pella9 Florisuga10
# reads Ø coverage/bp # reads Ø coverage/bp # reads Ø coverage/bp # reads Ø coverage/bp
Bfib 1,042 222 1,069 226 625 133 521 108
EEF2 1,571 212 1,482 192 344 44 642 86
EGR1 814 269 813 270 167 57 423 130
FGB 413 139 361 122 219 70 324 108
MB 526 158 538 163 178 54 357 108
ODC 654 232 611 215 296 102 379 132
RAG1 2,582 227 2,377 207 1,217 106 1,667 145
TGFB2 330 121 347 122 180 64 135 49
ZENK2 1,558 286 1,583 280 359 64 819 141
ZENK3 639 260 636 259 166 69 357 144
Mitochondrion 59,947 762 125,537 1,481 5,979 72 9,241 116
Results
21
Figure 5: The length distribution of assembled UCE loci alignments. In total 824 UCE alignments were assembled for
all samples. Plotted in this graph is the number of alignments that fell into the respective length interval (interval size 23
bp), ranging from 223 bp (min) to 1130 bp (max). The mean length of all UCE alignments lies at 870 bp (stdev = 150 bp).
Figure 6: Plot of variable sites within UCE-alignments. This plot shows the frequency of variable sites for each
position (relative to the total number of sequences that contain that position) across all UCE-alignments plotted in
relation to distance from the center of the conserved region (x=0). Plotting the UCE alignment data in this manner, the
highly conserved region around the core region becomes apparent, flanked by considerably more variable flanking
regions.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
22
Mitochondrial tree
The log-files of all Bayesian analyses were viewed in Tracer, version 1.6 [69] and examined for
convergence and effective sample sizes (ESS values) greater than 200. For the initial runs the MCMC
did not converge properly, because the sampling for various parameters stopped after several million
generations which caused a sudden leap in the inferred posterior likelihood (Figure S15). This issue
seemed to occur when the xml file was over-parameterized and the parameters were given too wide
of ranges to fluctuate within. We therefore applied more restrictive prior settings for ucld mean
values of all partitions (see Methods), and set the base frequencies for all partitions from ‘estimated’
to ‘empirical’ which solved the issue.
Concerning the dating priors, we found that MCMC runs converge well when all age priors
(substitution rates and clade age priors) were applied. When substitution rates were applied without
setting the clade age priors, the MCMC stopped sampling various parameters after approximately 4
million generations, indicating that these dating priors alone are not restrictive enough. The same
issue could be observed after 8-10 million generations when only one of the two clade-age priors was
applied without additional substitution rate information. When examining the data preceding the
problematic point in the MCMC, the estimated ages of unrestricted clades were concordant to
analyses in which all age priors were applied. This led us to the decision to apply all age priors
(substitution rates and clade-age priors as described in Methods) for further analyses.
A mitochondrial maximum clade credibility tree was generated from 9,000 trees of the posterior
distribution, with a burn-in of 10% (Figure 7). The split between Topaza and its sister genus Florisuga
(not shown in Figure 7) was inferred at 16.74 Ma (stdev = 1.38 Ma). The deepest split of
mitochondrial lineages within Topaza, the split between T. pyra and T. pella, is estimated to have
occurred 2.36 Ma ago (stdev = 0.21 Ma). Further, the mitochondrial tree suggests a relatively deep
split within T. pyra at 0.68 Ma ago (stdev = 0.09 Ma), leading to two separate mitochondrial lineages,
dividing samples sampled north from those sampled south of the Amazon River. Topaza pella shows
a similar pattern, even though the split of mitochondrial lineages appears to have occurred more
recently at 0.39 Ma ago (stdev = 0.05 Ma), and T. pella7, which was sampled at the southern bank of
the Amazon, appears in one clade with T. pella5 and T. pella6, both of which were sampled north of
the Amazon River. The mitochondrial tree in Figure 7 is completely resolved with 100 % support for
each node (Bayesian posterior probability).
Results
23
Figure 7: Time-calibrated phylogeny of Topaza based on the complete mitochondrial genome (BEAST). Taxa are
colored according to minimum clades that were found to be monophyletic throughout all tree inferences conducted in
this study and which are further confirmed though species delimitation analysis. Shown is the maximum clade credibility
tree, generated with 9,000 trees (1,000 burnin) of the posterior tree distribution. Node support values represent
Bayesian posterior probabilities and the time scale is in millions of years.
Species tree
Individuals analyzed separately
*BEAST - 8 Nuclear Genes
Similar to the convergence issues described above for the mitochondrial dataset, initial *BEAST
MCMC runs for the nuclear dataset stopped sampling certain parameters after several million
generations. These parameters were mainly concerning the loci EGR1 and ZENK2, which were the
most uninformative loci, showing less than 0.5% variable sites within Topaza samples across the
complete alignment (Table S4). After removing these two loci from the *BEAST analysis, the MCMC
showed good convergence. The resulting maximum clade credibility tree is shown in Figure 8a. The
split between Topaza and Florisuga (not shown in Figure 8a) is estimated to have occurred 18.23 Ma
ago (stdev = 1.47 Ma). The divergence between T. pyra and T. pella was estimated at 2.03 Ma ago
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
24
(stdev = 0.33 Ma). Concerning the phylogenetic structure within the two recognized species, this
multilocus nuclear dataset shows a different pattern than the mitochondrion (Figure 7). The topology
within T. pyra is distinctively different, and does not show a deep split between two separate
lineages grouping northern and southern samples separately. Within the T. pella complex, the
sample T. pella7 is placed with the two southern samples (T. pella8 and 9), and a split between
northern and southern samples (in relation to the Amazon River) is inferred to have occurred 0.65
Ma ago (stdev = 0.26 Ma). All support values for nodes within the recognized species are rather low,
which could either be due to the sequence alignments being too uninformative in order to analyze
phylogenetic patterns on such shallow time intervals, or it could be due to no phylogenetic structure
within the recognized species being present. The latter would cause a violation of the *BEAST
assumptions of no admixture between the separate tips, leading to low support values of the
respective nodes. We discuss these two possibilities in the following.
MP-EST - 824 Ultraconserved Elements
Among the entirety of the 824 assembled UCE alignments, a majority had an insufficient number
of variable sites in order to build informative gene trees for these loci. This resulted in a vast majority
of gene trees inferring polytomies between all samples. Lacking informative sites, occasional
mutations, which would then represent a good fraction of the complete variability of a UCE locus,
were weighed disproportionally in the gene tree inference, therefore not depicting the evolutionary
pattern but random stochastic processes. The only phylogenetic pattern that was consistently seen
among the gene tree topologies was the split between T. pyra and T. pella. The species tree which
was inferred based on the set of 824 gene trees (plus bootstrap replicates) with MP-EST (Figure S13),
weighs every gene tree equally and only evaluates the topology of the input gene trees, not
considering branch lengths of the input trees. As a result of the inconsistent gene tree topologies, the
species tree in Figure S13 shows extremely short internodal branch-lengths, as the few informative
loci that show shallower phylogenetic substructure are diluted among the many uninformative loci.
When only selecting the 73 most informative UCE-loci (>20 variable sites within Topaza), the
resulting MP-EST species tree Figure 8b shows an improved inference of the internodal structure.
The topology within T. pella is identical to the one inferred by *BEAST based on the nuclear gene-loci
Figure 8a. The inferred substructure within T. pyra has very low bootstrap support values and does
not show congruence with the split between northern and southern samples as inferred by the
mitochondrial tree (Figure 7).
SNAPP - 570 SNPs
SNAPP estimated both possible types of mutations within the binary SNP alignment (u: 0 -> 1 and
v: 1 -> 0) to occur equally as often, as the confidence intervals around both rates overlap (u: mean
Results
25
=0.92, stdev=0.0849; v: mean = 1.119, stdev=0.1252). The inferred species tree is depicted in Figure
8c and shows the same internal topology within T. pella as the other nuclear species trees (Figure 8a
and b). T. pella5 and 6, which were both sampled north of the Amazon River, form a well-supported
clade (98% posterior probability), and so do T. pella8 and 9 (99% posterior probability), both sampled
from south of the Amazon. Sample T. pella7 forms a monophyletic group with the southern clade
(79% posterior probability). The substructure within T. pyra is not very well supported (posterior
probabilities of 26% for both internal nodes). Figure 9 shows a DensiTree plot of the posterior species
tree distribution of the SNAPP analysis (discarding the first 1,000 trees as burn-in). Here, the lacking
substructure within T. pyra becomes apparent, as no predominant pattern can be seen among the
plotted trees within this clade. Differently, the plotted trees show a clear separation of two separate
lineages within T. pella, separating the northern (T. pella5 and 6) from the southern samples (T.
pyra7, 8 and 9). Yet, in the case of T. pella7, a smaller fraction of trees groups this sample with the
northern clade, in concordance with the inferred mitochondrial tree (Figure 7).
*BEAST - 8 Nuclear Genes and Mitochondrial genome
The inferred species tree from the 8 nuclear loci, and the addition of the complete mitochondrial
genome as a 9th partition, is shown in Figure 8d. Dissimilar to all other inferred species trees based
solely on nuclear data (Figure 8a, b and c), northern and southern samples within T. pyra form
separate monophyletic groups, yet are not very well supported (see node-support values). The
inferred split between these two lineages is dated very recently (mean = 0.05 Ma, stdev = 0.08 Ma).
The substructure within T. pella, on the other hand, is well supported, placing T. pella5 and 6 in a
monophyletic group with 78% posterior probability, and 8 and 9 in a separate clade with 95% node
support. The divergence between these two rather well-supported groups is estimated to have
occurred 0.23 Ma ago (stdev = 0.1 Ma), which is considerably earlier than estimated based on the
nuclear gene loci alone (Figure 8a), and on the more recent end of the confidence interval for the
dating in the mitochondrial tree Figure 7. The sample T. pella7 is positioned more closely to the
northern samples (5 and 6), forming a monophyletic clade with these samples that is not very
strongly supported (69% posterior probability). This positioning of T. pella7 is consistent with the
mitochondrial tree (Figure 7) but is not supported by the other species trees inferred from multi-
locus nuclear data (Figure 8a, b and c).
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
26
Figure 8: Species trees inferred for Topaza based on multilocus datasets, treating each individual as a separate
population (no species/population assignments). Taxa are colored according to consistently monophyletic clades that
were found across all tree inferences and are consistent with the species delimitation analysis. a) Time-calibrated *BEAST
species tree inference based on 8 nuclear genes (8,404 bp). Shown is the maximum clade credibility (mcc) tree based on
9,000 trees of the posterior distribution (burn-in 1,000), with node labels representing the Bayesian posterior probability
(Bpp) of the respective node. Time is scaled in million years. b) MP-EST generated species tree based on the gene trees of
the 73 most informative UCE-loci, scaled by coalescent units. Node labels represent percentages of bootstrap support
(1,000 bootstrap replicas) of respective nodes. c) SNAPP species tree based on 570 unlinked SNPs, scaled in generations
relative to mutation rate (µ, in mutations/site/generation). Shown is the mcc tree based on 9,000 trees of the posterior
distribution (burnin 1,000). Node labels show Bpp values. d) Time-calibrated *BEAST species tree based on the complete
mitochondrial genome (approx. 15,500 bp) and 8 nuclear loci (8,404 bp). The shown mcc tree was created from 9,000
trees of the posterior distribution (burn-in 1,000), node labels show Bpp values. Time is scaled in million years.
Results
27
Figure 9: DensiTree [74] plot of 9,000 trees of the posterior species tree distribution (burn-in 1,000) from the SNAPP
analysis of 570 SNPs. No coherent substructure is apparent among the plotted trees within T. pyra, yet a consistent
substructure within T. pella can be seen in the plot. Note the majority of trees connecting T_pella7 with the southern
clade (8 and 9), while a small fraction of trees connects this individual with the northern clade (5 and 6).
Species delimitation analysis
Initial DISSECT analyses of the nuclear and the mixed dataset, including all samples, show
indications of T. pella7 being of admixed origin, as this sample is grouped with both of the otherwise
distinct clades T. pella5+6 and T. pella8+9 (see Figure S14). As this is also supported by the above-
described results (Figure 8 & Figure 9), we excluded T. pella7 from further analyses since admixture
between separate inferred clades violates the DISSECT assumptions and can lead to the grouping of
distinct clades into one cluster, as these become linked through the admixed individual. This effect of
exclusion of one single problematic individual on the complete similarity matrix within T. pella can be
seen by comparing Figure S14 with Figure 10. The similarity matrices in Figure 10 show a strongly
supported genetic separation within T. pella between northern and southern samples. This split is
inferred more strongly within the exclusively nuclear dataset with posterior probability node support
values of 88% and 94% supporting the monophyly of the two separate clades. The dating of this split
is estimated at 0.85 Ma ago (stdev = 0.25 Ma). The DISSECT analysis of the mixed dataset, including
the mitochondrial genome, shows slightly weaker support for the split within T. pella (78% for both
clades) and dates this split more recently at 0.4 Ma ago (stdev = 0.02 Ma), more similarly to the
mitochondrial tree (Figure 7). Yet, it still suggests clear genetic substructure within T. pella. At the
same time, both analyses (nuclear and mixed dataset) show no support for any genetic substructure
among T. pyra samples. Based on these results, which are consistent with the various species tree
inferences described above (Figure 8) we assigned each sample to one of 4 distinct clades (Figure 11).
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
28
Figure 10: Similarity matrices showing results of SpeciesDelimitationAnalyser processing of the species tree
distribution inferred with DISSECT (burn-in 10%). Placed to the left of each matrix is the maximum clade credibility tree
of the posterior tree distribution (burn-in 1,000 trees). Node support values represent Bayesian posterior probabilities.
Trees are scaled by time in Ma. a) Based on 8 nuclear genes; b) Based on 8 nuclear genes and the mitochondrial genome.
Individuals assigned to populations
Species tree inferences were rerun with samples assigned to putatively separate populations
(Figure 11), as consistently suggested by the results of previous analyses. Sample T. pella7 was
assigned a separate clade in order to explore its position in the tree in relation to the identified
populations. The resulting species trees are shown in Figure 11. The dating of the basal split between
the separate populations within T. pella inferred with *BEAST is consistent to the previous analyses
of the same data without population assignments, yet with a narrower and therefore more precise
confidence interval, that is 0.69 Ma ago (stdev = 0.14 Ma) for the 8 nuclear gene dataset (Figure 11a)
a)
b)
Results
29
and 0.25 Ma ago (stdev = 0.08 Ma) for the mixed dataset (Figure 11d). The sample T. pella7 is
consistently placed in a monophyletic group with the southern population (T. pellaS) in the three
multilocus nuclear datasets (Figure 11a-c). When adding the mitochondrial DNA to the 8 nuclear
genes as a 9th partition (Figure 11d), T. pella7 is placed concordantly to the mitochondrial tree (Figure
7) with the northern population (T. pellaN), supported rather confidently with 81% Bayesian
posterior probability.
Figure 11: Species trees inferred for Topaza based on multilocus datasets, with applied population assignments for
all samples. See caption of Figure 8 for more information about the inference of the separate species trees. Note the
position of sample T. pella7 which is placed in a monophyletic clade with the southern population (T. pellaS) in the
species trees inferred from multilocus nuclear data (a-c) but is placed with good node support with the northern clade (T.
pellaN) in the *BEAST inference of the mixed dataset, including the mitochondrial genome. The top part of the figure
shows the new clade assignments based on consistently identified clusters in all previous species tree inferences and the
species delimitation analyses with DISSECT.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
30
Population structure
The bar graphs in Figure 12 depict the results of the STRUCTURE analyses of unlinked
polymorphic positions within the two recognized Topaza species. The plot is showing the admixture
of each individual between two putative genetic clusters (k=2). No population structure appears to
be present within T. pyra, as all samples within this species are equally admixed, which further
complements the previous evidence from the species tree inference methods and the DISSECT
analyses, shown in Figure 8, Figure 9 and Figure 10. In the case of T. pella, the inferred admixture
pattern is consistent with our previous results, with T. pella5 and 6 (population T. pellaN) showing a
different admixture pattern than T. pella7, 8 and 9, indicating two separate populations within T.
pella. The results suggest active or recent admixture between T. pella7 and the other samples with
origin south of the Amazon River. At the same time, T. pella7 contains a slightly higher percentage
(approx. 10%) of genetic material from “Cluster1” (dark grey) which is mainly present in the northern
samples (5 and 6) and rare among the other samples from south of the Amazon (8 and 9).
For both datasets (T. pella SNPs and T. pyra SNPs), the estimated probability of the data under
different settings for k were promoting k=1 to best fit the data, meaning that not sufficiently distinct
admixture patterns can be found within either of the species to truly infer two or multiple separate
genetic clusters. We chose k=2, which was the second best fit for the data, due to the incentive to
test for identifiable differences in the admixture pattern between different individuals within the
same species, requiring at least two separate genetic clusters.
Figure 12: Barplot of STRUCTURE results of admixture between the different individuals, generated separately for
each species. When two clusters are set by the user (k=2), no population structure can be seen within T. pyra (upper
plot), as all individuals appear to be equally admixed between these two clusters. Within T. pella, population structure is
visible when two clusters are inferred (k=2). Individuals are not equally admixed between these two putative clusters,
with T. pella5 and 6 carrying more than 80% of their genetic makeup from Cluster1, which is only present to less than
10% in T. pella7, 8 and 9.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
T_pyra1 T_pyra2 T_pyra3 T_pyra4
Cluster2 Cluster1
0% 10% 20% 30%
40% 50% 60% 70% 80% 90%
100%
T_pella5 T_pella6 T_pella7 T_pella8 T_pella9
Cluster2 Cluster1
Discussion
31
Discussion
Evaluation of phylogenetic relationships
Within this study we explored the genetic structure within the genus Topaza using a variety of
approaches based on different multilocus, nuclear and mitochondrial datasets. Consistently, through
all analyses, we see a clear divergence between the two currently recognized species T. pyra and T.
pella with no indications of gene flow between these species, thereby clearly advocating their rank as
separate species, which has been challenged based on morphological data by some authors [37],
[38].
Examining the genetic structure within these species, we consistently find a separation of two
lineages within T. pella, separating individuals sampled north from those sampled south of the
Amazon river. This split between northern and southern samples becomes apparent in all species
trees (Figure 8, Figure 9, and Figure 11). It further is strongly supported by species delimitation
analyses in DISSECT (Figure 10) and also indicated by the population structure analysis with
STRUCTURE (Figure 12). A recent study by Schmit-Ornés et al. [38] based on color spectral data,
found evidence for significant variation in colorization measurements between northern and
southern samples of T. pella in relation to the Amazon river, leading to the definition of two distinct
subspecies, namely T. pella pella (north) and T. pella microrhyncha (south), separated by the river.
Our data strongly supports these morphological findings and suggests the distinction of these two
separate subspecies within T. pella.
One exception which is not as strongly supported is sample T. pella7, which in all species tree
inferences based on multilocus nuclear data, is placed with the southern samples (T. pella
microrhyncha), but with rather low node support for this placement (Figure 8a-c and Figure 11). The
plot of the complete species tree distribution of the SNAPP analysis (Figure 9) shows that there is
some uncertainty whether to place this sample with the northern clade (T. pella pella, samples 5 and
6) or the southern clade (T. pella microrhyncha, samples 8 and 9), even though the vast majority of
trees suggest a placement closer to the southern subspecies T. pella microrhyncha, which is also
supported by the STRUCTURE results (Figure 12). The sample T. pella7 was collected from the
southern bank of the Amazon River, close to the estuary of the Amazon, laying within the geographic
range that has formerly been assigned to a separate subspecies (based on morphometric and
coloration data) referred to as T. pella smaragdula [39], [40]. The proposed range of this putative
subspecies extends from the southern riverbed close to the estuary of the Amazon across the eastern
part of the Guiana Shield in the area of French Guiana. Sequence data of more individuals from
particularly the area of French Guiana would be required to genetically examine the validity of this
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
32
subspecies assignment. The observed uncertainty of the placement of sample T. pella7 in the species
tree could be due to a past admixture between the two otherwise distinct subspecies T. pella pella
and T. pella microrhyncha, followed by a time of adjacent gene-flow between the ancestral
population of T. pella7 and T. pella microrhyncha, causing the majority of gene loci to share a more
recent common history with this subspecies, while a smaller fraction of genes share a more recent
common history with T. pella pella. Such a putative admixture event would further explain the
position of T. pella7 in the mitochondrial tree, where it is placed more closely to T. pella pella with
absolute certainty (100% Bayesian posterior probability).
Mitochondrial tree - the odd one out
One striking pattern that becomes obvious when comparing the mitochondrial tree (Figure 7)
with all inferred species trees (Figure 8, Figure 9 and Figure 10), is the deep split between northern
(T. pyra1 and 2) and southern samples (3 and 4) within T. pyra, which is only present within the
mitochondrial tree. All other trees show alternating topologies within T. pyra, none of which are well
supported. The species delimitation analysis based on the dataset of 8 nuclear genes does not detect
any genetic clusters within T. pyra, suggesting to collapse all four samples within T. pyra in the
species tree. Additionally, the conducted STRUCTURE analysis, based exclusively on SNPs that show
polymorphisms within the T. pyra samples, did not recover any differences in the genetic makeup of
the different T. pyra samples, indicating ongoing or recent admixture between all sampled individuals
within this species. This combined evidence of no genetic structure within T. pyra seems
contradictive to the inferred history of the mitochondrial lineages. Previous studies based on
morphological data [38], [40] distinguish two separate subspecies within T. pyra, which are separated
by an east-west gradient, one of which occurs in the Peruvian highlands, while the other one occurs
along the Rio Negro. However, no north-south division between individuals of T. pyra as suggested
by the mitochondrial tree has been previously hypothesized based on morphological evidence. There
are various possible explanations for the observed discrepancy between the mitochondrial tree and
the species trees.
One explanation could be that the nuclear loci are simply not informative enough in order to
infer recent splits of lineages, while mitochondrial DNA with a generally higher mutation rate shows
genetic structure on shallower times, therefore being the more sensitive and suitable dataset for
exploring more recent genetic substructure. This, however, seems unlikely; in particular, the SNP
dataset, consisting of only polymorphic positions within Topaza, represents a dataset of maximum
informativeness, exceeding the mitochondrial mutation rate by orders of magnitude. Particularly
those SNPs with polymorphisms within T. pyra, as extracted for the STRUCTURE analysis, would be
expected to show a pattern if population structure is present, which was not found in this study
Discussion
33
(Figure 12). Further, all nuclear datasets consistently recovered genetic substructure within the sister
species T. pella on comparably shallow evolutionary times, demonstrating the suitability of the
nuclear datasets for exploring such substructuring.
A biological mechanism that could cause the inferred deep split between mitochondrial lineages
within T. pyra is selection, acting on two separate mitochondrial haplotypes. Previous studies have
found evidence for such biallelic selection on mitochondrial haplotypes causing deep divergences of
mitochondrial lineages within sympatric bird populations [15], [75]. Possible divergence of two
separate mitochondrial haplotypes within a panmictically admixing population due to selection has
also been demonstrated by simulation studies [76]. Cases of direct selection on mitochondrial
sequences have mainly been related to altitudinal differences [15], where differing aerobic
conditions may act as selection factors for mitochondrial loci coding for enzymes involved in
oxidative phosphorylation. No noteworthy differences in altitude are present between the sample
locations of the T. pyra specimens used for this study. Yet, the possibility of direct or indirect
selection on the mitochondrial genome through other selection factors, maintaining two distinct
haplotypes, remains a plausible explanation.
Another possible biological explanation is that the observed pattern of strong genetic structure
within the mitochondrial tree, which is not present in the nuclear data, could be connected to the
different inheritance mechanism of mitochondria in comparison to nuclear DNA. As a solely
maternally inherited locus, the genetic divergence of two separate mitochondrial lineages could be
restricted to females. A scenario is thinkable in which the Amazon River acts as a dispersal barrier for
female individuals of T. pyra, while male individuals occasionally cross the Amazon, thereby keeping
the population genetically admixed. Gender-specific differences in the average dispersal distance
have been commonly found within avian studies in the last decades [77]–[80]. These studies more
commonly found females to be the further dispersing gender, yet there appear to be family-specific
differences as to which is the further dispersing gender [78]. No information about gender-specific
dispersal rates for hummingbirds (Trochilidae) are known to the authors. However, a possibly higher
dispersal rate of male Topaza hummingbirds could explain a pattern as the one found in this study,
where two distinctly different mitochondrial lineages are present in a population that is genetically
admixed in respect to autosomal DNA. Considering that the nuclear genes, as well as the UCE data
and the extracted SNPs, are of exclusively autosomal origin, this explanation appears to be a likely
scenario to explain the observed discrepancy between the mitochondrial and the nuclear data.
Additional sequence data of the female sex chromosome (W), which is not present in males could be
used to further test this hypothesis, as these sequences would be expected to show the same genetic
pattern as the mitochondrion. Furthermore, ecological data concerning gender specific dispersal
rates of Topaza hummingbirds could bring more light into this discussion.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
34
This study adds to a growing number of avian studies that identify cases of mitochondrial
structure found within otherwise admixing populations [15], [81]–[83]. Such discrepancies between
the mitochondrial tree and the species tree are, in general, to be seen more as informative rather
than problematic. It can provide possible evidence for important biological factors, such as selection
or gender-specific dispersal patterns, which are important drivers of evolution. At the same time, this
case also points out the problematic nature of the mitochondrion as a phylogenetic marker. As a
locus that can be strongly affected by selection [15], [84]–[87], the mitochondrion may present a
violation to the assumption of neutrality in coalescent methods. Further, a hypothetical case of
female specific lineage divergence, as discussed in this study, could lead to the mistaken definition of
cryptic species or subspecies when only looking at genetic information from the mitochondrion,
thereby overseeing the present admixture of autosomal loci.
Rivers as dispersal barriers
We are not aware of any ecological data regarding the specific dispersal ability of Topaza
hummingbirds. However, the data presented here suggest that, in particular the Amazon River
appears to act as a dispersal barrier for Topaza hummingbirds. In the case of T. pella, the range of the
subspecies T. pella pella and T. pella microrhyncha is separated by the Amazon River. For a rainforest
dwelling genus like Topaza [40], no other obvious dispersal barriers are present between the ranges
of these two subspecies. This makes it a likely conclusion that the Amazon imposes a dispersal barrier
on T. pella strong enough to lead to the separation of two genetically distinct subspecies on opposite
sides of the Amazon River. A dispersal barrier effect of the Amazon River on forest-dwelling bird
species in particular has been confirmed by other studies [4], [88]–[90], yet it appears to be unique
for the Amazon River and has not been confirmed consistently for a variety of bird species for other
big river systems. Even though the barrier effect of the Amazon appears to be very strong for T. pella,
considering the strict separation of T. pella pella and T. pella microrhyncha in all datasets, it appears
to be a somewhat permeable barrier as indicated by sample T. pella7. This sample, which was
collected at the southern river bank close to the estuary of the Amazon shows indications of possible
admixture with the northern subspecies T. pella pella, which is also indicated by the mitochondrial
tree (Figure 7). It is plausible that the dispersal barrier effect around the estuary area is somewhat
reduced, due to the forking of the Amazon into a wide delta region characterized by a multitude of
small islands.
No consistent substructure is promoted within T. pyra that would indicate a dispersal barrier
effect of the Amazon. This finding is consistent with the results of a previous study, executed with
large variety of bird taxa (n > 400), which found that the lower and wider section of the Amazon
presents a significantly stronger dispersal barrier than the upper, narrower section [90]. Yet, there is
Discussion
35
evidence within the mitochondrial data of T. pyra suggesting divergence of two separate
mitochondrial haplotypes, separated by the Amazon River, which could be attributed to a dispersal
barrier effect of the Amazon on female birds, as discussed above.
Effect of mtDNA on species tree
When adding the genetic information of the complete mitochondrial genome to the multilocus
nuclear dataset (8 genetic markers), the position of sample T. pella7 becomes heavily influenced by
this additional partition. While the nuclear dataset places T. pella7 with a lot of uncertainty in a
monophyletic group with the samples belonging to T. pella microrhyncha (Figure 11a), the mixed
dataset, including the mitochondrion, places it rather confidently (81 % Bayesian posterior
probability) with the other subspecies T. pella pella (Figure 11d). Considering the previous
uncertainty, the addition of one locus (in this case the mitochondrion) has a substantial influence on
the inferred species tree regarding both the placement of sample T. pella7 and the respective node
support. Within this mixed dataset the mitochondrion is by far the most extensive and informative
locus. Such a difference in informativeness between simultaneously analyzed loci has been projected
to bias multilocus coalescent methods [18] such as *BEAST. If the influence of the additional
mitochondrial sequence on the species tree observed here is truly disproportionate, would need to
be explored further with simulation studies. Yet, this case points out the importance of sequencing
loci with a sufficient amount of informative sites when using these sequences in a multispecies
coalescent approach. Particularly on shallow evolutionary time scales it seems therefore plausible
that mitochondrial loci within mixed multilocus datasets substantially contribute to the species tree
phylogeny, possibly leading to false certainty of inferred clades.
Evaluation of datasets
The relatively recent time of divergence events addressed within this study (< 1 Ma ago) pushes
the standard nuclear genetic markers to the boundary of their utility. This becomes obvious when
looking at the node support values in the *BEAST inferred species tree in Figure 8a. This tree in
general lacks good node support values for all inferred clades below the species level, which in the
case of the internal nodes within T. pyra is probably due to no genetic structure being present (see
other species tree inferences, Figure 8b-d, Figure 9 and Figure 10 and STRUCTURE results, Figure 12).
Yet, the lack of support for the inferred clades within T. pella appears to be due to a lack of
informativeness within these data, as these clades are well supported in other species trees (Figure
8c&Figure 9) and supported by the STRUCTURE results (Figure 12). The DISSECT results show, that
this issue is eliminated when excluding the phylogenetically problematic sample T. pella7 (compare
Figure S14a with Figure 10a). After the exclusion of this sample the nuclear loci provide a reliable
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
36
dataset for the species delimitation analyses with DISSECT, as can be judged by the support values in
the DISSECT tree (Figure 10a). Nevertheless, the nuclear gene loci used in this study do not provide
sufficient information to reliably explore genetic patterns below the species level.
We therefore strongly recommend the generation of genome-wide multilocus datasets which
allow for the extraction of highly informative SNP data when working on such evolutionary shallow
times. Within this study, SNP data yield the most reliable species tree inference (see node support
values in Figure 8c), and additionally open up a range of further analytical methods. The UCE dataset
used in this study proves to be an excellent source of genetic information for the generation of such
SNP datasets, as UCE sequences are easily generated and universally applicable for a wide range of
organism groups. The use of complete UCE sequences for gene tree and subsequent species tree
analyses on the other hand, was found to not be very feasible within this study, since these loci
remain too conserved among the sampled individuals. This opposes the findings of previous studies
examining the performance of UCE loci on shallow evolutionary times [28]. We found that gene trees
inferred from UCE loci were rather uninformative, resulting in many cases in wide polytomies. This
variety of uninformative and in many cases conflicting gene tree topologies causes the species tree
inference in MP-EST, which only considers the topology of the input gene trees, to infer a species
tree with very short intermodal distances (Figure 8b & Figure S13). On the other hand the terminal
branches are outstandingly long, creating an unusual tree shape which has also been recognized by
previous studies [91], [92], being referred to as a “bonsai tree”. This appears to be an MP-EST specific
issue with particularly UCE data, which in previous studies has been attributed to the inaccurate
reconstruction of gene trees [92]. The consistently long terminal branch lengths occur due to MP-EST
arbitrarily assigning a branch length value around 9 coalescent units when branch length cannot be
properly estimated [91]. We find that a selection of the most informative UCE loci for species tree
inference in MP-EST reduces the bonsai-effect (compare Figure S13 with Figure 8b). Further we find
that this filtering of the data improves the topology of the inferred species tree; the species tree
inference based on the subset of the 73 most informative UCE loci (Figure 8b) is concordant with the
consistently inferred clades in other species trees (Figure 8a,c,d), whereas the topology of the tree
based on all 824 UCE loci conflicts with good bootstrap support values with the otherwise
consistently observed monophyly of the samples T. pella5 and T. pella6. These findings show that
“shortcut” coalescent methods (referring to methods which do not co-estimate gene trees and
species tree) like MP-EST, do not necessarily follow the dogma “the more, the better”, but can be
substantially improved by sorting out too uninformative loci, particularly when inferring phylogenetic
relationships on shallow evolutionary times.
Conclusion
37
Conclusion
The inference of genetic substructure within species-limits is located in an area of overlap
between the fields of phylogenetics and population genetics. In this field of overlap, we find that
phylogenetically popular data sources do not perform well when inferring evolutionary history, due
to a lack of informativeness. Additionally, complex admixture patterns among subspecies can limit
the utility of multispecies coalescent methods, such as BEAST and *BEAST, since samples cannot
always be assigned to clearly defined populations without gene-flow among each other. The
mitochondrial tree provides an exceptionally well resolved gene tree, which enables us to explore
the phylogenetic relationship between mitochondrial haplotypes. As shown in this study, these
phylogenetic relationships can be misleading, since the mitochondrial tree resembles a single
genealogy, which is in many cases not concordant with the species tree, particularly on shallow
evolutionary times. An appropriate estimate of the species tree in this case can most successfully be
achieved with highly informative SNP data. Additionally, SNP data open up the possibility of applying
population genetic methods such as STRUCTURE, which are of great value for the exploration of the
genetic data. We find that a combination of phylogenetic and population genetic methods is very
useful to identify consistent patterns among the nuclear data which we use to infer subspecies
assignments. We conclude that especially the SNP data are a very useful dataset in order to explore
genetic substructure and phylogenetic relationships between individuals. Our finding that the lower
Amazon River constitutes a rather strict dispersal barrier for Topaza is novel for this genus, and may
inspire future studies to investigate further if limited dispersal across the Amazon River can also be
observed in closely related hummingbird genera. Further we want to highlight in this study that a
discrepancy between the mitochondrial tree and the species tree can give rise to biologically
intriguing hypotheses, such as gender specific dispersal barriers or selection on mitochondrial genes;
it is our intent to further pursue these postulations.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
38
Acknowledgements
I want to thank Alexandre Antonelli and Urban Olsson as my main supervisors for enabling me to
carry out such an interesting and exciting project for my Master’s thesis project and for all the time
and support you were able to give me. It was a lot of fun and a great experience to work with you
over the past year. Special thanks go to Alex Antonelli for giving me the opportunity of being a full
member of his research group, for the great group retreats and of course for free breakfast every
Monday.
I thank Alexandre Fernandes for providing this wonderful dataset for my Masters project and for
interesting and helpful information about the biology of Topaza hummingbirds.
Big thanks also to Bernard Pfeil for great help with the phylogenetic tree inference with in
particular Bayesian methods and all the time that we spent together philosophizing about the
outcomes.
Further I thank Mats Töpel as my bioinformatic supervisor for the invaluable support with
software issues on the Albiorix cluster and beyond that for helpful advice with other bioinformatic
challenges around the project.
Thanks to Yann Bertrand and Filipe de Sousa for helpful input and the sharing of various scripts in
particular for Illumina read data processing. A big thank you also goes to Alexander Zizka, who was of
big help with his excellent R skills for creating the distribution maps. Further, I want to thank Daniele
Silvestro from the University of Gothenburg and Martin Ryberg from the University of Uppsala for
helpful scripts for specific data processing steps. I also want to thank Britt Anderson for proof-
reading this manuscript and giving very helpful input during the writing process. I thank the
managers of the named institutions and museums (Table 1) and the collectors for having provided
the material for DNA extraction that made this study possible. Finally I want to thank the European
and Sweden Research Council for funding this project.
References
39
References
[1] W. M. Brown, M. George, and A. C. Wilson, “Rapid evolution of animal mitochondrial DNA,” Proc. Natl. Acad. Sci., vol. 76, no. 4, pp. 1967–1971, Apr. 1979.
[2] S. Sammler, C. Bleidorn, and R. Tiedemann, “Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae) provide evidence for pervasive mitochondrial DNA recombination.,” BMC Genomics, vol. 12, no. 1, p. 35, 2011.
[3] G. Voelker, S. Rohwer, R. C. K. Bowie, and D. C. Outlaw, “Molecular systematics of a speciose, cosmopolitan songbird genus: Defining the limits of, and relationships among, the Turdus thrushes,” Mol. Phylogenet. Evol., vol. 42, no. 2, pp. 422–434, 2007.
[4] A. M. Fernandes, M. Wink, and A. Aleixo, “Phylogeography of the chestnut-tailed antbird (Myrmeciza hemimelaena) clarifies the role of rivers in Amazonian biogeography,” J. Biogeogr., vol. 39, no. 8, pp. 1524–1535, 2012.
[5] S. G. DuBay and C. C. Witt, “An improved phylogeny of the Andean tit-tyrants (Aves, Tyrannidae): More characters trump sophisticated analyses,” Mol. Phylogenet. Evol., vol. 64, no. 2, pp. 285–296, 2012.
[6] J. W. O. Ballard and M. C. Whitlock, “The incomplete natural history of mitochondria,” Mol. Ecol., vol. 13, no. 4, pp. 729–744, 2004.
[7] A. Tatarenkov and J. C. Avise, “Rapid concerted evolution in animal mitochondrial DNA.,” Proc. Biol. Sci., vol. 274, no. 1619, pp. 1795–8, Jul. 2007.
[8] K. Ogoh and Y. Ohmiya, “Concerted evolution of duplicated control regions within an ostracod mitochondrial genome.,” Mol. Biol. Evol., vol. 24, no. 1, pp. 74–8, Jan. 2007.
[9] J. R. Eberhard, T. F. Wright, and E. Bermingham, “Duplication and concerted evolution of the mitochondrial control region in the parrot genus Amazona.,” Mol. Biol. Evol., vol. 18, no. 7, pp. 1330–42, Jul. 2001.
[10] J. H. Degnan and N. a. Rosenberg, “Gene tree discordance, phylogenetic inference and the multispecies coalescent,” Trends Ecol. Evol., vol. 24, no. 6, pp. 332–340, 2009.
[11] B. Nabholz, S. Glémin, and N. Galtier, “The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals.,” BMC Evol. Biol., vol. 9, p. 54, 2009.
[12] S. Berlin and H. Ellegren, “Evolutionary genetics. Clonal inheritance of avian mitochondrial DNA.,” Nature, vol. 413, no. 6851, pp. 37–8, Sep. 2001.
[13] S. Berlin, D. Tomaras, and B. Charlesworth, “Low mitochondrial variability in birds may indicate Hill-Robertson effects on the W chromosome.,” Heredity (Edinb)., vol. 99, no. 4, pp. 389–96, Oct. 2007.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
40
[14] G. D. D. Hurst and F. M. Jiggins, “Problems with mitochondrial DNA as a marker in population, phylogeographic and phylogenetic studies: the effects of inherited symbionts.,” Proc. Biol. Sci., vol. 272, no. 1572, pp. 1525–1534, 2005.
[15] Z. a. Cheviron and R. T. Brumfield, “Migration-selection balance and local adaptation of mitochondrial haplotypes in Rufous-Collared Sparrows (Zonotrichia Capensis) along an elevational gradient,” Evolution (N. Y)., vol. 63, no. 6, pp. 1593–1605, 2009.
[16] J. W. Ballard and M. Kreitman, “Unraveling selection in the mitochondrial genome of Drosophila.,” Genetics, vol. 138, no. 3, pp. 757–72, Nov. 1994.
[17] A. Corl and H. Ellegren, “Sampling strategies for species trees: The effects on phylogenetic inference of the number of genes, number of individuals, and whether loci are mitochondrial, sex-linked, or autosomal,” Mol. Phylogenet. Evol., vol. 67, no. 2, pp. 358–366, 2013.
[18] E. L. Jockusch, I. Martinez-Solano, and E. K. Timpe, “The Effects of Inference Method, Population Sampling, and Gene Sampling on Species Tree Inferences: An Empirical Study in Slender Salamanders (Plethodontidae: Batrachoseps),” Syst. Biol., vol. 64, no. 1, pp. 66–83, 2014.
[19] F. Jacobsen and K. E. Omland, “Species tree inference in a recent radiation of orioles (Genus Icterus): Multiple markers and methods reveal cytonuclear discordance in the northern oriole group,” Mol. Phylogenet. Evol., vol. 61, no. 2, pp. 460–469, 2011.
[20] A. Camargo, L. J. Avila, M. Morando, and J. W. Sites, “Accuracy and precision of species trees: effects of locus, individual, and base pair sampling on inference of species trees in lizards of the Liolaemus darwinii group (Squamata, Liolaemidae).,” Syst. Biol., vol. 61, no. 2, pp. 272–88, Mar. 2012.
[21] J. S. Williams, J. H. Niedzwiecki, and D. W. Weisrock, “Species tree reconstruction of a poorly resolved clade of salamanders (Ambystomatidae) using multiple nuclear loci.,” Mol. Phylogenet. Evol., vol. 68, no. 3, pp. 671–82, Sep. 2013.
[22] G. Bejerano, M. Pheasant, I. Makunin, S. Stephen, W. J. Kent, J. S. Mattick, and D. Haussler, “Ultraconserved elements in the human genome.,” Science, vol. 304, no. 5675, pp. 1321–5, May 2004.
[23] L. A. Pennacchio, N. Ahituv, A. M. Moses, S. Prabhakar, M. A. Nobrega, M. Shoukry, S. Minovitsky, I. Dubchak, A. Holt, K. D. Lewis, I. Plajzer-Frick, J. Akiyama, S. De Val, V. Afzal, B. L. Black, O. Couronne, M. B. Eisen, A. Visel, and E. M. Rubin, “In vivo enhancer analysis of human conserved non-coding sequences.,” Nature, vol. 444, no. 7118, pp. 499–502, Nov. 2006.
[24] A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, G. M. Weinstock, R. K. Wilson, R. A. Gibbs, W. J. Kent, W. Miller, and D. Haussler, “Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.,” Genome Res., vol. 15, no. 8, pp. 1034–50, Aug. 2005.
[25] W. Miller, K. Rosenbloom, R. C. Hardison, M. Hou, J. Taylor, B. Raney, R. Burhans, D. C. King, R. Baertsch, D. Blankenberg, S. L. Kosakovsky Pond, A. Nekrutenko, B. Giardine, R. S. Harris, S. Tyekucheva, M. Diekhans, T. H. Pringle, W. J. Murphy, A. Lesk, G. M. Weinstock, K. Lindblad-Toh, R. A. Gibbs, E. S. Lander, A. Siepel, D. Haussler, and W. J. Kent, “28-way vertebrate
References
41
alignment and conservation track in the UCSC Genome Browser.,” Genome Res., vol. 17, no. 12, pp. 1797–808, Dec. 2007.
[26] B. C. Faircloth, J. E. McCormack, N. G. Crawford, M. G. Harvey, R. T. Brumfield, and T. C. Glenn, “Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales.,” Syst. Biol., vol. 61, no. 5, pp. 717–26, Oct. 2012.
[27] B. T. Smith, J. E. McCormack, A. M. Cuervo, M. J. Hickerson, A. Aleixo, C. D. Cadena, J. Pérez-Emán, C. W. Burney, X. Xie, M. G. Harvey, B. C. Faircloth, T. C. Glenn, E. P. Derryberry, J. Prejean, S. Fields, and R. T. Brumfield, “The drivers of tropical speciation,” Nature, vol. 515, no. 7527, pp. 406–409, Sep. 2014.
[28] B. T. Smith, M. G. Harvey, B. C. Faircloth, T. C. Glenn, and R. T. Brumfield, “Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales,” Syst. Biol., vol. 63, no. 1, pp. 83–95, 2014.
[29] E. D. Jarvis, S. Mirarab, a. J. Aberer, B. Li, P. Houde, C. Li, S. Y. W. Ho, B. C. Faircloth, B. Nabholz, J. T. Howard, a. Suh, C. C. Weber, R. R. da Fonseca, J. Li, F. Zhang, H. Li, L. Zhou, N. Narula, L. Liu, G. Ganapathy, B. Boussau, M. S. Bayzid, V. Zavidovych, S. Subramanian, T. Gabaldon, S. Capella-Gutierrez, J. Huerta-Cepas, B. Rekepalli, K. Munch, M. Schierup, B. Lindow, W. C. Warren, D. Ray, R. E. Green, M. W. Bruford, X. Zhan, a. Dixon, S. Li, N. Li, Y. Huang, E. P. Derryberry, M. F. Bertelsen, F. H. Sheldon, R. T. Brumfield, C. V. Mello, P. V. Lovell, M. Wirthlin, M. P. C. Schneider, F. Prosdocimi, J. a. Samaniego, a. M. V. Velazquez, a. Alfaro-Nunez, P. F. Campos, B. Petersen, T. Sicheritz-Ponten, a. Pas, T. Bailey, P. Scofield, M. Bunce, D. M. Lambert, Q. Zhou, P. Perelman, a. C. Driskell, B. Shapiro, Z. Xiong, Y. Zeng, S. Liu, Z. Li, B. Liu, K. Wu, J. Xiao, X. Yinqi, Q. Zheng, Y. Zhang, H. Yang, J. Wang, L. Smeds, F. E. Rheindt, M. Braun, J. Fjeldsa, L. Orlando, F. K. Barker, K. a. Jonsson, W. Johnson, K.-P. Koepfli, S. O’Brien, D. Haussler, O. a. Ryder, C. Rahbek, E. Willerslev, G. R. Graves, T. C. Glenn, J. McCormack, D. Burt, H. Ellegren, P. Alstrom, S. V. Edwards, a. Stamatakis, D. P. Mindell, J. Cracraft, E. L. Braun, T. Warnow, W. Jun, M. T. P. Gilbert, and G. Zhang, Whole-genome analyses resolve early branches in the tree of life of modern birds, vol. 346, no. 6215. 2014.
[30] D. Bryant, R. Bouckaert, J. Felsenstein, N. A. Rosenberg, and A. RoyChoudhury, “Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis.,” Mol. Biol. Evol., vol. 29, no. 8, pp. 1917–32, Aug. 2012.
[31] J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of population structure using multilocus genotype data.,” Genetics, vol. 155, no. 2, pp. 945–59, Jun. 2000.
[32] E. Y. Durand, N. Patterson, D. Reich, and M. Slatkin, “Testing for ancient admixture between closely related populations,” Mol. Biol. Evol., vol. 28, no. 8, pp. 2239–2252, 2011.
[33] F. E. Rheindt, M. K. Fujita, P. R. Wilton, and S. V. Edwards, “Introgression and phenotypic assimilation in zimmerius flycatchers (Tyrannidae): Population genetic and phylogenetic inferences from genome-wide SNPs,” Syst. Biol., vol. 63, no. 2, pp. 134–152, 2014.
[34] P. H. Brito and S. V Edwards, “Multilocus phylogeography and phylogenetics using sequence-based markers.,” Genetica, vol. 135, no. 3, pp. 439–55, Apr. 2009.
[35] J. del Hoyo, N. Collar, G. M. Kirwan, and P. Boesman, “Fiery Topaz (Topaza pyra),” Handbook of the Birds of the World Alive. Lynx Edicions, Barcelona, 2015.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
42
[36] K. L. Schuchmann, G. M. Kirwan, and P. Boesman, “Crimson Topaz (Topaza pella),” Handbook of the Birds of the World Alive. Lynx Edicions, Barcelona, 2015.
[37] K. L. Schuchmann, “Family Trochilidae (hummingbirds),” in Handbook of the birds of the world, Volume 5., J. del Hoyo, A. Elliott, and J. Sargatal, Eds. Barcelona, Spain: Lynx Edicions, 1999, pp. 468–680.
[38] A. Ornés-Schmitz and K. L. Schuchmann, “Taxonomic review and phylogeny of the hummingbird genus Topaza Gray , 1840 using plumage color spectral information,” Ornitol. Neotrop., no. 22, pp. 25–38, 2011.
[39] J. L. Peters, Check-list of birds of the world, Volume 5. Cambridge, Massachusetts: Harvard Univ. Press, 1945.
[40] D.-S. Hu, L. Joseph, and D. J. Agro, “Distribution, variation, and taxonomy of Topaza Hummingbirds (Aves: Trochilidae),” Ornitol. Neotrop., vol. 11, no. 1982, pp. 123–142, 2000.
[41] S. J. Hackett, R. T. Kimball, S. Reddy, R. C. K. Bowie, E. L. Braun, M. J. Braun, J. L. Chojnowski, W. A. Cox, K.-L. Han, J. Harshman, C. J. Huddleston, B. D. Marks, K. J. Miglia, W. S. Moore, F. H. Sheldon, D. W. Steadman, C. C. Witt, and T. Yuri, “A phylogenomic study of birds reveals their evolutionary history.,” Science, vol. 320, no. 5884, pp. 1763–1768, 2008.
[42] C. H. Graham, J. L. Parra, C. Rahbek, and J. a McGuire, “Phylogenetic structure in tropical hummingbird communities.,” Proc. Natl. Acad. Sci. U. S. A., vol. 106 Suppl , pp. 19673–19678, 2009.
[43] A. L. Chubb, “Nuclear corroboration of DNA-DNA hybridization in deep phylogenies of hummingbirds, swifts, and passerines: the phylogenetic utility of ZENK (ii).,” Mol. Phylogenet. Evol., vol. 30, no. 1, pp. 128–39, Jan. 2004.
[44] E. Quintero, C. C. Ribas, and J. Cracraft, “The Andean Hapalopsittaca parrots (Psittacidae, Aves): an example of montane-tropical lowland vicariance,” Zool. Scr., vol. 42, no. 1, pp. 28–43, Jan. 2013.
[45] A. J. Drummond, M. A. Suchard, D. Xie, and A. Rambaut, “Bayesian phylogenetics with BEAUti and the BEAST 1.7.,” Mol. Biol. Evol., vol. 29, no. 8, pp. 1969–73, Aug. 2012.
[46] J. Heled and A. J. Drummond, “Bayesian inference of species trees from multilocus data.,” Mol. Biol. Evol., vol. 27, no. 3, pp. 570–80, Mar. 2010.
[47] L. Liu, L. Yu, and S. V Edwards, “A maximum pseudo-likelihood approach for estimating species trees under the coalescent model.,” BMC Evol. Biol., vol. 10, no. 1, p. 302, 2010.
[48] G. Jones, Z. Aydin, and B. Oxelman, “DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent.,” Bioinformatics, vol. 31, no. 7, pp. 991–998, Nov. 2014.
[49] M. Topel, M. F. Calio, A. Zizka, R. Scharn, D. Silvestro, and A. Antonelli, “SpeciesGeoCoder: Fast categorisation of species occurrences for analyses of biodiversity, biogeography, ecology and evolution,” Cold Spring Harbor Labs Journals, Sep. 2014.
References
43
[50] B. L. Sullivan, C. L. Wood, M. J. Iliff, R. E. Bonney, D. Fink, and S. Kelling, “eBird: An online database of bird distribution and abundance [web application],” Biological Conservation 142, 2009. [Online]. Available: http://www.ebird.org. [Accessed: 11-May-2015].
[51] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin, “The Sequence Alignment/Map format and SAMtools.,” Bioinformatics, vol. 25, no. 16, pp. 2078–9, Aug. 2009.
[52] I. Milne, G. Stephen, M. Bayer, P. J. A. Cock, L. Pritchard, L. Cardle, P. D. Shaw, and D. Marshall, “Using Tablet for visual exploration of second-generation sequencing data.,” Brief. Bioinform., vol. 14, no. 2, pp. 193–202, Mar. 2013.
[53] M. Kearse, R. Moir, A. Wilson, S. Stones-Havas, M. Cheung, S. Sturrock, S. Buxton, A. Cooper, S. Markowitz, C. Duran, T. Thierer, B. Ashton, P. Meintjes, and A. Drummond, “Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.,” Bioinformatics, vol. 28, no. 12, pp. 1647–9, Jun. 2012.
[54] C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, and T. L. Madden, “BLAST+: architecture and applications.,” BMC Bioinformatics, vol. 10, no. 1, p. 421, Jan. 2009.
[55] V. H. Nguyen and D. Lavenier, “PLAST: parallel local alignment search tool for database comparison.,” BMC Bioinformatics, vol. 10, no. 1, p. 329, Jan. 2009.
[56] S. K. Wyman, R. K. Jansen, and J. L. Boore, “Automatic annotation of organellar genomes with DOGMA.,” Bioinformatics, vol. 20, no. 17, pp. 3252–5, Nov. 2004.
[57] G. C. Conant and K. H. Wolfe, “GenomeVx: simple web-based creation of editable circular chromosome maps.,” Bioinformatics, vol. 24, no. 6, pp. 861–2, Mar. 2008.
[58] T. A. Hall, “BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT.,” in Nucleic Acids Symposium, vol. 41, Oxford University Press, 1999, pp. 95–98.
[59] M. A. Pacheco, F. U. Battistuzzi, M. Lentino, R. F. Aguilar, S. Kumar, and A. a. Escalante, “Evolution of modern birds revealed by mitogenomics: Timing the radiation and origin of major orders,” Mol. Biol. Evol., vol. 28, no. 6, pp. 1927–1942, 2011.
[60] H. D. Marshall, A. J. Baker, and A. R. Grant, “Complete mitochondrial genomes from four subspecies of common chaffinch (Fringilla coelebs): New inferences about mitochondrial rate heterogeneity, neutral theory, and phylogenetic relationships within the order Passeriformes,” Gene, vol. 517, no. 1, pp. 37–45, 2013.
[61] D. Posada and T. R. Buckley, “Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests.,” Syst. Biol., vol. 53, no. 5, pp. 793–808, Oct. 2004.
[62] D. P. Martin, P. Lemey, M. Lott, V. Moulton, D. Posada, and P. Lefeuvre, “RDP3: a flexible and fast computer program for analyzing recombination.,” Bioinformatics, vol. 26, no. 19, pp. 2462–3, Oct. 2010.
[63] D. Martin and E. Rybicki, “RDP: detection of recombination amongst aligned sequences,” Bioinformatics, vol. 16, no. 6, pp. 562–563, Jun. 2000.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
44
[64] J. M. Smith, “Analyzing the mosaic structure of genes.,” J. Mol. Evol., vol. 34, no. 2, pp. 126–9, Mar. 1992.
[65] D. Posada and K. A. Crandall, “Evaluation of methods for detecting recombination from DNA sequences: computer simulations.,” Proc. Natl. Acad. Sci. U. S. A., vol. 98, no. 24, pp. 13757–62, Nov. 2001.
[66] H. R. L. Lerner, M. Meyer, H. F. James, M. Hofreiter, and R. C. Fleischer, “Multilocus resolution of phylogeny and timescale in the extant adaptive radiation of Hawaiian honeycreepers,” Curr. Biol., vol. 21, no. 21, pp. 1838–1844, 2011.
[67] J. a. McGuire, C. C. Witt, J. V. Remsen, A. Corl, D. L. Rabosky, D. L. Altshuler, and R. Dudley, “Molecular phylogenetics and the diversification of hummingbirds,” Curr. Biol., vol. 24, no. 8, pp. 910–916, 2014.
[68] T. Gernhard, “The conditioned reconstructed process.,” J. Theor. Biol., vol. 253, no. 4, pp. 769–78, Aug. 2008.
[69] A. Rambaut, M. A. Suchard, W. Xie, and A. Drummond, “Tracer v1. 6,” 2013.
[70] S. Guindon, J.-F. Dufayard, V. Lefort, M. Anisimova, W. Hordijk, and O. Gascuel, “New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.,” Syst. Biol., vol. 59, no. 3, pp. 307–21, May 2010.
[71] T.-K. Seo, “Calculating bootstrap probabilities of phylogeny using multilocus sequence data.,” Mol. Biol. Evol., vol. 25, no. 5, pp. 960–71, May 2008.
[72] T. I. Shaw, Z. Ruan, T. C. Glenn, and L. Liu, “STRAW: Species TRee Analysis Web server.,” Nucleic Acids Res., vol. 41, no. Web Server issue, pp. W238–41, Jul. 2013.
[73] R. Bouckaert, J. Heled, D. Kühnert, T. Vaughan, C.-H. Wu, D. Xie, M. A. Suchard, A. Rambaut, and A. J. Drummond, “BEAST 2: a software platform for Bayesian evolutionary analysis.,” PLoS Comput. Biol., vol. 10, no. 4, p. e1003537, Apr. 2014.
[74] R. R. Bouckaert, “DensiTree: making sense of sets of phylogenetic trees.,” Bioinformatics, vol. 26, no. 10, pp. 1372–3, May 2010.
[75] D. P. L. Toews and A. Brelsford, “The biogeography of mitochondrial and nuclear discordance in animals,” Mol. Ecol., vol. 21, no. 16, pp. 3907–3930, 2012.
[76] D. E. Irwin, “Local adaptation along smooth ecological gradients causes phylogeographic breaks and phenotypic clustering.,” Am. Nat., vol. 180, no. 1, pp. 35–49, Jul. 2012.
[77] P. J. Greenwood and P. H. Harvey, “The Natal and Breeding Dispersal of Birds,” Annu. Rev. Ecol. Syst., vol. 13, pp. 1–21, 1982.
[78] P. J. Greenwood, “Mating systems, philopatry and dispersal in birds and mammals,” Anim. Behav., vol. 28, no. 4, pp. 1140–1162, Nov. 1980.
[79] B. Czyz, M. Borowiec, A. Wasiñska, R. Pawliszko, and K. Mazur, “Breeding-season dispersal of male and female Penduline Tits (Remiz pendulinus) in south-western Poland,” Ornis Fenn., Jan. 2012.
References
45
[80] M. Szulkin and B. C. Sheldon, “Dispersal as a means of inbreeding avoidance in a wild bird population.,” Proc. Biol. Sci., vol. 275, no. 1635, pp. 703–11, Mar. 2008.
[81] D. E. Irwin, S. Bensch, J. H. Irwin, and T. D. Price, “Speciation by distance in a ring species.,” Science, vol. 307, no. 5708, pp. 414–6, Jan. 2005.
[82] Å. M. Ribeiro, P. Lloyd, and R. C. K. Bowie, “A tight balance between natural selection and gene flow in a southern African arid-zone endemic bird.,” Evolution, vol. 65, no. 12, pp. 3499–514, Dec. 2011.
[83] C. N. Spottiswoode, K. F. Stryjewski, S. Quader, J. F. R. Colebrook-Robjent, and M. D. Sorenson, “Ancient host specificity within a single species of brood parasitic bird.,” Proc. Natl. Acad. Sci. U. S. A., vol. 108, no. 43, pp. 17738–42, Oct. 2011.
[84] M. Ehinger, P. Fontanillas, E. Petit, and N. Perrin, “Mitochondrial DNA variation along an altitudinal gradient in the greater white-toothed shrew, Crocidura russula.,” Mol. Ecol., vol. 11, no. 5, pp. 939–45, May 2002.
[85] D. Mishmar, E. Ruiz-Pesini, P. Golik, V. Macaulay, A. G. Clark, S. Hosseini, M. Brandon, K. Easley, E. Chen, M. D. Brown, R. I. Sukernik, A. Olckers, and D. C. Wallace, “Natural selection shaped regional mtDNA variation in humans.,” Proc. Natl. Acad. Sci. U. S. A., vol. 100, no. 1, pp. 171–6, Jan. 2003.
[86] E. Ruiz-Pesini, D. Mishmar, M. Brandon, V. Procaccio, and D. C. Wallace, “Effects of purifying and adaptive selection on regional variation in human mtDNA.,” Science, vol. 303, no. 5655, pp. 223–6, Jan. 2004.
[87] P. Fontanillas, A. Dépraz, M. S. Giorgi, and N. Perrin, “Nonshivering thermogenesis capacity associated to mitochondrial DNA haplotypes and gender in the greater white-toothed shrew, Crocidura russula.,” Mol. Ecol., vol. 14, no. 2, pp. 661–70, Feb. 2005.
[88] A. M. Fernandes, M. Wink, C. H. Sardelli, and A. Aleixo, “Multiple speciation across the Andes and throughout Amazonia: The case of the spot-backed antbird species complex (Hylophylax naevius/Hylophylax naevioides),” J. Biogeogr., vol. 41, no. 6, pp. 1094–1104, 2014.
[89] C. C. Ribas, a. Aleixo, a. C. R. Nogueira, C. Y. Miyaki, and J. Cracraft, “A palaeobiogeographic model for biotic diversification within Amazonia over the past three million years,” Proc. R. Soc. B Biol. Sci., vol. 279, no. 1729, pp. 681–689, 2012.
[90] F. E. Hayes and J. A. N. Sewlal, “The Amazon River as a dispersal barrier to passerine birds: Effects of river width, habitat and taxonomy,” J. Biogeogr., vol. 31, no. 11, pp. 1809–1818, 2004.
[91] J. Gatesy and M. S. Springer, “Phylogenetic Analysis at Deep Timescales: Unreliable Gene Trees, Bypassed Hidden Support, and the Coalescence/Concatalescence Conundrum.,” Mol. Phylogenet. Evol., vol. 80, pp. 231–266, 2014.
[92] M. S. Springer and J. Gatesy, “Land plant origins and coalescence confusion.,” Trends Plant Sci., vol. 19, no. 5, pp. 267–9, May 2014.
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
46
Supplemental Material
Figure S13: Species tree inferred with MP-EST based on 824 gene trees of UCEs. We created 1000 bootstrap
replicates of the 824 gene tree dataset and inferred each set separately in MP-EST. The 1000 resulting trees were
collapsed to the maximum clade credibility tree with TreeAnnotator. The tree is scaled in coalescent units. The node
support values represent bootstrap support.
Supplemental Material
47
Figure S14: Similarity matrices showing results of SpeciesDelimitationAnalyser processing of the species tree
distribution inferred with DISSECT (burn-in 10%). To the left of each matrix is the maximum clade credibility tree of the
posterior tree distribution (burn-in 1,000 trees). Node support values represent Bayesian posterior probabilities. a) Based
on 8 nuclear genes; b) Based on 8 nuclear genes and the mitochondrial genome. Particularly the nuclear dataset (a)
suggests sample T. pella7 being an admixed individual between the two otherwise distinct populations (5+6 and 8+9).
a)
b)
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
48
Table S3: Overview of read coverage of individual samples and the amount of reads (relative and total)
ID no. of reads (total)
Mitochondrial reads (total)
Mitochondrial reads (relative)
Length of mitochondrial genome (in bp)
1 545,396 63,816 0.117 16,783
2 5,641,580 154,537 0.027 16,762
3 877,918 38,572 0.044 16,835
4 1,777,653 16,146 0.009 16,862
5 738,337 72,207 0.098 16,849
6 2,929,439 164,804 0.056 16,828
7 920,515 59,947 0.065 16,834
8 1,423,376 125,537 0.088 16,844
9 678,092 5,979 0.009 16,824
10 586,239 9,241 0.016 16,842
Table S4: The table shows the gene loci that were assembled, the length of the alignment of these loci between all
samples and the number of variable sites per locus that were found across the alignment of the 9 Topaza samples,
including substitutions and insertions and deletions (the latter count as 1, independently of length of in/del). Marked in
grey are the loci that were excluded from phylogenetic analyses due to too less informative sites (<1%).
Locus Length Variable sites (total)
Variable sites, relative to length
Bfib 1,079 28 0.026
EEF2 1,634 31 0.019
EGR1 609 0 0.000
FGB 660 28 0.042
MB 724 14 0.019
ODC 618 16 0.026
RAG1 2,638 39 0.015
TGFB2 575 13 0.023
ZENK2 1,187 2 0.002
ZENK3 504 6 0.012
Supplemental Material
49
Figure S15: Example of initial issues with MCMC convergence in BEAST. When prior settings were too unrestrictive
and parameters were given wide ranges to fluctuate within, the MCMC stopped sampling certain parameters after
several million generations, as e.g. the Cytb ucld.mean.rate shown in the upper graph. This stop of sampling of some
parameters caused the posterior likelihood estimation to leap over several 100 likelihood units (see lower graph).
Bayesian posterior likelihood
Cytb ucld.mean.rate
Tobias Hofmann: Misled by the mitochondrial genome - a phylogenetic study in Topaza hummingbirds
50
Table S5: Mitochondrial genome annotation information for all samples. The first column states the name of the
locus, the second column gives information about the orientation of the reading frame for the respective locus (forward
strand or backward strand). The numbers in the start/end columns mark the position of each locus on the mitochondrial
genome (unit is bp).
locus Florisuga T. pyra1 T. pyra2 T. pyra3 T. pyra4
start end start end start end start end start end
trnF-gaa + 1 72 1 69 1 69 1 69 1 69
rrnS + 73 1031 70 1030 70 1030 70 1030 70 1030
trnVuac + 1041 1114 1039 1111 1039 1111 1040 1112 1040 1112
rrnL + 1149 2693 1173 2663 1173 2663 1174 2664 1174 2664
trnLuaa + 2705 2778 2705 2778 2705 2778 2706 2779 2706 2779
nad1 + 2790 3764 2807 3763 2807 3763 2808 3764 2808 3764
trnIgau + 3766 3839 3765 3837 3765 3837 3766 3838 3766 3838
trnQuug - 3851 3921 3849 3919 3849 3919 3850 3920 3850 3920
trnMcau + 3921 3990 3919 3989 3919 3989 3920 3990 3920 3990
nad2 + 3991 5028 3990 5027 3990 5027 3991 5028 3991 5028
trnWuca + 5030 5099 5029 5098 5029 5098 5030 5099 5030 5099
trnAugc - 5101 5169 5100 5168 5100 5168 5101 5169 5101 5169
trnNguu - 5173 5245 5172 5244 5172 5244 5173 5245 5173 5245
trnCgca - 5249 5315 5248 5314 5248 5314 5249 5315 5249 5315
trnYgua - 5315 5386 5314 5385 5314 5385 5315 5386 5315 5386
cox1 + 5388 6935 5387 6934 5387 6934 5388 6935 5388 6935
trnPugg - 6930 7003 6929 7002 6929 7002 6930 7003 6930 7003
trnDguc + 7006 7074 7005 7073 7005 7073 7006 7074 7006 7074
cox2 + 7076 7756 7075 7755 7075 7755 7076 7756 7076 7756
trnAagc + 7760 7828 7760 7829 7760 7829 7761 7830 7761 7830
atp8 + 7830 7994 7831 7995 7831 7995 7832 7996 7832 7996
atp6 + 7988 8668 7989 8669 7989 8669 7990 8670 7990 8670
cox3 + 8671 9453 8672 9454 8672 9454 8673 9455 8673 9455
trnGucc + 9455 9523 9456 9524 9456 9524 9457 9525 9457 9525
nad3 + 9524 9697 9525 9698 9525 9698 9526 9699 9526 9699
nad3 + 9699 9872 9700 9873 9700 9873 9695 9874 9701 9874
trnRucg + 9877 9946 9879 9948 9879 9948 9880 9949 9880 9949
nad4l + 9948 10241 9950 10243 9950 10243 9951 10244 9951 10244
nad4 + 10238 11605 10240 11607 10240 11607 10241 11608 10241 11608
trnHgug + 11617 11685 11619 11687 11619 11687 11620 11688 11620 11688
trnSgcu + 11686 11751 11688 11754 11688 11754 11689 11755 11689 11755
trnLuag + 11752 11823 11756 11827 11756 11827 11757 11828 11757 11828
nad5 + 11826 13625 11834 13633 11834 13633 11835 13634 11835 13634
cob + 13650 14789 13659 14798 13659 14798 13660 14799 13660 14799
trnTugu + 14796 14864 14805 14872 14805 14872 14806 14873 14806 14873
trnPugg - 14867 14936 14877 14946 14877 14946 14878 14947 14878 14947
nad6 - 14951 15469 14968 15486 14968 15486 14969 15487 14969 15487
trnEuuc - 15470 15540 15487 15557 15487 15557 15488 15558 15488 15558
misc. + 15541 16842 15558 16783 15558 16762 15559 16835 15559 16862
Supplemental Material
51
Extension of Table S5:
locus T. pella5 T. pella6 T. pella 7 T. pella8 T. pella9
start end start end start end start end start end
trnF-gaa + 1 69 1 69 1 69 1 69 1 69
rrnS + 70 1034 70 1030 70 1033 70 1029 70 1031
trnV-uac + 1040 1112 1039 1111 1039 1111 1037 1109 1037 1109
rrnL + 1174 2664 1173 2663 1173 2663 1171 2660 1171 2660
trnL-uaa + 2707 2780 2706 2779 2706 2779 2704 2777 2704 2777
nad1 + 2809 3765 2829 3764 2808 3764 2806 3762 2806 3762
trnI-gau + 3767 3839 3766 3838 3766 3838 3764 3836 3764 3836
trnQ-uug - 3851 3921 3850 3920 3850 3920 3848 3918 3848 3918
trnM-cau + 3921 3991 3920 3990 3920 3990 3918 3988 3918 3988
nad2 + 3992 5029 3991 5028 3991 5028 3989 5026 3989 5026
trnW-uca + 5031 5100 5030 5099 5030 5099 5028 5097 5028 5097
trnA-ugc - 5102 5170 5101 5169 5101 5169 5099 5167 5099 5167
trnR-ucg + 5156 5228 5155 5227 5155 5227 5153 5225 5153 5225
trnN-guu - 5174 5246 5173 5245 5173 5245 5171 5243 5171 5243
trnC-gca - 5250 5316 5249 5315 5249 5315 5247 5313 5247 5313
trnY-gua - 5316 5387 5315 5386 5315 5386 5313 5384 5313 5384
cox1 + 5389 6936 5388 6935 5388 6935 5386 6933 5386 6933
trnS-uga - 6931 7004 6930 7003 6930 7003 6928 7001 6928 7001
trnL-aag - 7005 7075 7004 7074 7004 7074 7002 7072 7002 7072
trnD-guc + 7007 7075 7006 7074 7006 7074 7004 7072 7004 7072
cox2 + 7077 7757 7076 7756 7076 7756 7074 7754 7074 7754
trnA-agc + 7762 7831 7761 7830 7761 7830 7759 7828 7759 7828
atp8 + 7833 7997 7832 7996 7832 7996 7830 7994 7830 7994
atp6 + 7991 8671 7990 8670 7990 8670 7988 8668 7988 8668
cox3 + 8674 9456 8673 9455 8673 9455 8671 9453 8671 9453
trnG-ucc + 9458 9526 9457 9525 9457 9525 9455 9523 9455 9523
nad3 + 9527 9700 9526 9699 9526 9699 9524 9697 9524 9697
nad3 + 9702 9875 9701 9874 9701 9874 9699 9872 9699 9872
trnR-ucg + 9881 9950 9880 9949 9880 9949 9878 9947 9878 9947
nad4l + 9952 10245 9951 10244 9951 10244 9949 10242 9949 10242
nad4 + 10242 11609 10241 11608 10241 11608 10239 11606 10239 11606
trnH-gug + 11621 11689 11620 11688 11620 11688 11618 11686 11618 11686
trnS-gcu + 11690 11756 11689 11755 11689 11755 11687 11753 11687 11753
trnL-uag + 11758 11829 11757 11828 11757 11828 11755 11826 11755 11826
nad5 + 11838 13637 11837 13636 11837 13636 11834 13633 11834 13633
cob + 13663 14802 13662 14801 13662 14801 13659 14798 13659 14798
trnT-ugu + 14809 14876 14808 14875 14808 14875 14805 14873 14805 14873
trnP-ugg - 14881 14950 14880 14949 14880 14949 14878 14947 14878 14947
nad6 - 14971 15489 14970 15488 14969 15487 14968 15486 14968 15486
trnE-uuc - 15490 15560 15489 15559 15488 15558 15487 15557 15487 15557
misc. + 15561 16849 15560 16828 15559 16834 15558 16844 15558 16824