draft - university of toronto t-spacedraft vierna et al. 1 1 pcr cycles above routine numbers do not...
TRANSCRIPT
Draft
PCR cycles above routine numbers do not compromise high-
throughput DNA barcoding results
Journal: Genome
Manuscript ID gen-2017-0081.R2
Manuscript Type: Note
Date Submitted by the Author: 30-May-2017
Complete List of Authors: Vierna, J.; AllGenetics & Biology SL Doña, J.; Estacion Biologica de Donana CSIC Vizcaíno, A.; AllGenetics & Biology SL Serrano, D.; Estacion Biologica de Donana CSIC Jovani, R.; Estación Biológica de Doñana (CSIC)
Is the invited manuscript for consideration in a Special
Issue? : This submission is not invited
Keyword: COI, DNA barcoding, Illumina, library, mutations
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 1
PCR cycles above routine numbers do not compromise high-throughput 1
DNA barcoding results 2
3
J. Vierna1+, J. Doña2+, A. Vizcaíno1, D. Serrano3, R. Jovani2* 4
5
1AllGenetics & Biology SL. Edificio CICA. Campus de Elviña s/n. E-15008 A Coruña. Spain. 6
Tel: +34881015551. 7
2 Department of Evolutionary Ecology, Estación Biológica de Doñana (CSIC). Avenida Américo 8
Vespucio s/n. E-41092 Sevilla. Spain. Tel: +34954466700. 9
3 Department of Conservation Biology, Estación Biológica de Doñana (CSIC). Avenida Américo 10
Vespucio s/n. E-41092 Sevilla. Spain. Tel: +34954466700. 11
12
+Joint first authors. 13
*Corresponding author. 14
15
Key words: COI, DNA barcoding, Illumina, library, mutations, non-proofreading polymerase. 16
17
Running title: PCR cycles in high-throughput DNA barcoding 18
19
20
21
Page 1 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 2
Abstract 22
High-throughput DNA barcoding has become essential in ecology and evolution but some 23
technical questions still remain. Increasing the number of PCR cycles above routine 20-30 cycles 24
is a common practice when working with old-type specimens, with little amounts of DNA, or 25
when facing annealing issues with the primers. However, increasing the number of cycles can 26
raise the number of artificial mutations due to polymerase errors. In this work we sequenced 20 27
COI libraries in the Illumina MiSeq platform. Libraries were prepared with 40, 45, 50, 55, and 60 28
PCR cycles from four individuals belonging to four species of four genera of cephalopods. We 29
found no relationship between the number of PCR cycles and the number of mutations despite 30
using a nonproofreading polymerase. Moreover, even when using a high number of PCR cycles 31
the resulting number of mutations was low enough not to be an issue in the context of high-32
throughput DNA barcoding (but may still remain an issue in DNA metabarcoding due to chimera 33
formation). We conclude that the common practice of increasing the number of PCR cycles 34
should not negatively impact the outcome of a high-throughput DNA barcoding study in terms of 35
the occurrence of point mutations. 36
Page 2 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 3
Introduction 37
High-throughput DNA barcoding (for single specimens; Shokralla et al. 2014, 2015; Toju 2015), 38
as well as similar methods such as DNA metabarcoding (for mixed species samples; Taberlet et 39
al. 2012) or amplicon metagenomics, combine DNA-based species identification using 40
standardised markers (DNA barcoding, Hebert et al. 2003) with the power of high-throughput 41
sequencing (HTS). These methods are powerful tools in life sciences research (Taberlet et al. 42
2012; Kress et al. 2015; Toju 2015), from studying century-old type specimens (Prosser et al. 43
2015), to assessing species composition of gut microbiota (Abdelrhman et al. 2016) from mixed 44
samples. 45
Here we focus on high-throughput DNA barcoding. This methodology overcomes some 46
of the problems which currently limit DNA barcoding, such as the high DNA template 47
concentration required for Sanger sequencing and the co-amplification of other DNA templates 48
due to intrasample contamination, Wolbachia infection, gut contents, heteroplasmy, and 49
pseudogenes. Moreover, high-throughput DNA barcoding reduces both per specimen costs and 50
labour time by nearly 80 %, thus allowing to be scaled up to deal with large-scale biodiversity 51
monitoring projects (Shokralla et al. 2015, Cruaud et al. 2017). 52
However, even though high-throughput DNA barcoding is a promising method, some 53
technical issues require further study. For example, some authors have explored the impact of the 54
sequencing platform (Smith & Peay 2014), the polymerase used (Oliver et al. 2015; Brandariz-55
Fontes et al. 2015), the DNA barcode length (Hajibabaei et al. 2006; Doña et al. 2015), the 56
library preparation method (Schirmer et al. 2015), the primers (Schirmer et al. 2015), the 57
annealing temperature (Schmidt et al. 2013), or the phenomenon known as mistagging (Schnell et 58
Page 3 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 4
al. 2015; Esling et al. 2015) in DNA metabarcoding or amplicon sequencing. Recently, Geisen et 59
al. (2015), and Díaz-Real et al. (2015) studied to what extent DNA metabarcoding produced 60
quantitative (and not only qualitative), and reliable results in two groups of symbionts. Finally, 61
several other papers have dealt with some of these issues through bioinformatic analysis of the 62
HTS reads (Caporaso et al. 2010; Coissac et al. 2012; Edgar 2013; Bokulich et al. 2013; Boyer et 63
al. 2016). 64
Here we focused on the number of PCR cycles used for library preparation. This is a 65
technical issue that can potentially impact the biological conclusions of high-throughput DNA 66
barcoding projects, but that has not yet been studied in detail. Increasing the number of PCR 67
cycles above the normal 20-35 cycles (e.g. Shokralla et al. 2014, 2015; Carew et al. 2016) is a 68
common practice: for example, when working with old type specimens (Prosser et al. 2015), with 69
small amounts of input DNA or when the PCR is inefficient (e.g. Blaalid et al. 2013; Ellis et al. 70
2013; Carew et al. 2016). However, a large number of PCR cycles may entail the risk of 71
increasing the number of artificial mutations on the output sequencing reads because of DNA 72
polymerase errors and the amplification of these errors in subsequent PCR cycles (Cha et al. 73
1993; Hengen 1995; Brandariz-Fontes et al. 2015). This is a potential major problem for high-74
throughput DNA barcoding because it can eventually distort, among others, genetic threshold-75
based species delimitation. Yet, to our knowledge, how these extra cycles affect DNA barcoding 76
results has never been investigated. 77
To explore the consequences of the number of PCR cycles upon the number of artificial 78
mutations we extracted DNA from four different individuals belonging to four cephalopod 79
species. From each of the four DNA samples, we prepared five high-throughput DNA barcoding 80
Page 4 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 5
libraries with different number of PCR cycles: from 40, i.e. roughly 20 cycles higher than regular 81
numbers, to 60, as done commonly when dealing with problematic samples. After sequencing the 82
20 libraries using the Illumina MiSeq platform, we studied the relationship between the number 83
of PCR cycles and the number of mutations present in the MiSeq reads. Our results show that, for 84
a number of cycles between 40 and 60, there is no relationship between the number of PCR 85
cycles and the number of mutations, the number of reads with mutations being very low. 86
Therefore, we conclude that a number of PCR cycles as high as 60 does not compromise the 87
success of a high-throughput DNA barcoding project in terms of the occurrence of point 88
mutations. 89
90
Materials and methods 91
Four ethanol-preserved tissues obtained from different cephalopod species belonging to the 92
orders Octopoda, Oegopsida, and Sepiida were analysed (see sample IDs and cephalopod species 93
in Table 1). Species were identified according to morphology and DNA barcoding (Fernando 94
Fernández-Álvarez, personal communication). The genetic p-distances between the selected 95
individuals were between 80.1 and 85.7 for the cytochrome c oxidase subunit I gene (COI) used 96
in this study. 97
Total DNA was extracted from each individual using the NZYTissue gDNA Isolation Kit 98
(NZYTech). DNAs were quantified with the Qubit dsDNA HS Assay Kit (ThermoFisher 99
Scientific) and used as input for the preparation of the libraries. 100
We followed a standard Illumina library preparation protocol. In brief, we amplified the 101
COI region (i.e. the standard animal barcode, Hebert et al. 2003) and included the Illumina 102
Page 5 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 6
specific adapters and indices by following a two-step PCR approach, slightly modified from 103
Lange et al. (2014). For the sake of clarity, we refer to these PCRs as PCR1 and PCR2. 104
PCR1 primers were LCO1490 and HCO2198 (Folmer et al. 1994), which proved 105
successful in a previous study in which the same specimens were DNA barcoded (Fernando 106
Fernández-Álvarez et al., personal communication). Oligonucleotide tails bearing the Illumina 107
sequencing primers were attached to the 5’ ends of primers LCO1490 and HCO2198. PCR2 was 108
carried out with tailed primers which bear the indices and adapters and anneal to the Illumina 109
sequencing primers (see Fig. 1 for a schematic representation of the binding process). 110
PCR1 was carried out using 25 ng of total DNA in a final volume of 25 µL containing 111
6.50 µL of Supreme NZYTaq Green PCR Master Mix (NZYTech) (nonproofreading polymerase; 112
error rate of 1x10-5 according to the manufacturer), 0.5 µM of each primer, and PCR-grade water 113
up to 25 µL. The thermal cycling conditions were as follows: an initial denaturation step at 95 ºC 114
for 5 min, followed by 35, 40, 45, 50, or 55 cycles (see Fig. 2) of denaturation at 95 ºC for 30 s; 115
annealing at 53 ºC for 30 s; extension at 72 ºC for 45 s; and a final extension step at 72 ºC for 10 116
min. The products of PCR1 were purified using the SPRI method (DeAngelis et al. 1995), with 117
Mag-Bind RXNPure Plus magnetic beads (Omega Biotek). The purified products were loaded in 118
a 1 % agarose gel stained with GreenSafe (NZYTech), and visualised under UV light. 119
PCR2 was carried out using 2.5 µL of the purified PCR1 products, and the same 120
conditions as for PCR1 except for the number of cycles, which was set to five (Fig. 2) and the 121
annealing temperature (60 ºC). The products obtained were purified following the SPRI method 122
as indicated above. Then, the purified products were loaded in a 1 % agarose gel stained with 123
Page 6 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 7
GreenSafe (NZYTech), and visualised under UV light. All samples yielded libraries of the 124
expected size. 125
Libraries were quantified using the Qubit dsDNA HS Assay Kit (Life Technologies), and 126
pooled in equimolar amounts. The pool was sequenced in a fraction of a 600-cycle run (MiSeq 127
Reagent Kit v3; PE300) of an Illumina MiSeq sequencer along with a PhiX library used to 128
increase sequence diversity of the overall library, in Macrogen (Seoul, Korea). 129
FASTQ files were demultiplexed using RTA 1.18.54 (Illumina) and checked with FastQC 130
0.11.3 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then, they were quality-131
trimmed using very conservative parameters in Trimmomatic 0.36 (Bolger et al. 2014) with the 132
option SLIDINGWINDOW:1:30. SLIDINGWINDOW starts scanning at the 5' end and clips the 133
read once the average quality within the window falls below a threshold (Trimmomatic Manual 134
0.32). We set the size of the window to 1 and the quality threshold to 30 (Phred Quality Score). 135
Therefore, when the quality of a single nucleotide fell below a Phred Quality Score of 30, the 136
read was clipped from this position to the 3' end. We used these very conservative parameters in 137
order to make sure that the mutations observed in the sequencing results were due to PCR errors 138
and not to sequencing errors. The quality of the resulting files was checked again with FastQC. 139
Quality-trimmed FASTQ files were imported into Geneious 8.1.6 140
(http://www.geneious.com, Kearse et al. 2012). Each pair of R1 and R2 files were set as paired 141
reads in order to improve the mapping. A map-to-reference analysis was carried out with the 142
Geneious mapper using relaxed parameters (maximum number of mismatches per read, 25 %; 143
minimum overlap identity, 80 %) to allow potentially mutated reads to map. The DNA barcode 144
sequences from the four cephalopod specimens were set as references (DDBJ/EMBL/GenBank 145
Page 7 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 8
accession numbers KX078469 - KX078472). The results of the map-to-reference analysis were 146
inspected manually to verify that the reads of each library mapped to the correct reference 147
sequence. We obtained 20 assembly files corresponding to the four species by the five PCR 148
treatments. 149
Regions including the first 50 nucleotides of the mapped R1 and R2 reads (starting 150
immediately after the primer annealing region) were aligned in each assembly with Muscle 151
(Edgar 2004) as implemented in Geneious 8.1.6. We selected these two 50-nucleotide regions 152
because such read length accumulated the maximum number of reads after passing the quality 153
threshold (see above); using larger regions would have reduced the sample size and, therefore, the 154
statistical power of the analysis. Reads were trimmed to the same length to simplify later 155
bioinformatic analyses. 156
For each alignment file, we calculated the number of mutations per read by comparing 157
every read against the consensus sequence. The consensus sequences obtained from the FASTA 158
files of the same species were identical between them (regardless of the number of PCR cycles) 159
and they were also identical to the corresponding COI sequences available in 160
DDBJ/EMBL/GenBank. For this, we used a custom developed R function (R Development Core 161
Team 2016) to calculate the number of mutations by multiplying the pairwise genetic p-distance 162
by the total length of our reads. The function treated insertions and deletions (indels) as single 163
mutational steps and the genetic p-distance was calculated with the dist.dna function (raw model) 164
from the ape 3.4 R package (Paradis et al. 2004). Then, we ran a Poisson generalised linear 165
mixed model (GLMM) on the entire resulting data set (glmer function from package lme4 1.1-12; 166
Bates et al. 2015). We considered the number of mutations as the response variable, the number 167
Page 8 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 9
of cycles as the predictor variable, and the species as a random factor. We confirmed assumptions 168
underlying GLMMs by exploring regression residuals for normality against a Q-Q plot. 169
Finally, to make sure that the PCR1 reaction was still functioning after 55 cycles (i.e. that 170
the emergence of new artificial mutations was still possible), qPCRs were performed in all four 171
samples with the same parameters as in PCR1, but with 60 cycles in order to cover the whole 172
range of our experiment. The resulting fluorescence vs number of cycles plots were visually 173
analysed, confirming that the reaction was still taking place after 55 cycles. 174
175
Results 176
Due to the stringent quality-filtering, only 2.26 % of the raw reads were used for the statistical 177
analyses (see Table S1). The average quality of both the raw and quality-trimmed reads, as 178
measured with FastQC, is available in Fig. S1. 179
We detected mutations in 4,176 out of the 69,792 reads analysed (i.e. 5.98 %) which passed the 180
quality-filtering step, mapped to the correct reference sequence, and were located within the 181
50-nucleotide stretches after the primer annealing regions. 182
The number of mutations was consistent across species and the maximum number of mutations 183
per read was three along different treatments (Fig. 3 and Table 1). Accordingly, we found no 184
effect of the number of cycles on the number of mutations (Fig. 3; Slope ± SE = 0.0002 ± 0.0024, 185
Z = 0.096, P = 0.923). 186
187
188
189
Page 9 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 10
Discussion 190
In this work we investigated whether increasing the number of PCR cycles during library 191
preparation produces a higher number of mutations which could eventually impact the outcome 192
of a high-throughput DNA barcoding study. We demonstrated that even for a high number of 193
cycles (60, i.e. up to 55 cycles for PCR1 and five additional cycles for PCR2) the number of 194
reads with mutations remained very low despite using a non-proofreading enzyme and despite the 195
potential occurrence of heteroplasmy (which would increase the number of mutated positions 196
when compared to the reference sequence). However, we only analysed two regions of 50 197
nucleotides each from the COI animal DNA barcode, whereas different genomic regions may 198
impose different error rates to DNA polymerase (e.g. Arezi et al. 2003). Nevertheless, the lack of 199
effect we found in these regions with high sequence quality by experimentally increasing the 200
number of PCR cycles indicates that PCR cycles might have negligeable impacts on point 201
mutations and subsequent taxonomic assignment. 202
Some DNA metabarcoding-specific technical issues can arise by an increase in the 203
number of PCR cycles, and thus require further study. For instance, chimeras are hybrid 204
amplicons which can be formed during a PCR when an aborted extension product from an earlier 205
cycle functions as a primer in a subsequent PCR cycle (Haas et al. 2011). Chimeras inflate 206
diversity in an artificial manner and should be carefully taken into account. In this work, 207
chimeras were not an issue because we prepared our libraries using DNA from individual 208
specimens (i.e. high-throughput DNA barcoding libraries). However, the formation of chimeras 209
has been found to be correlated with the number of PCR cycles and to the consumption of the 210
primers (Wang & Wang 1996; Qiu et al. 2001; Thompson et al. 2002). Fortunately, several 211
Page 10 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 11
bioinformatic tools have been developed to deal with chimeras and thus their impact can be 212
greatly reduced (Edgar et al. 2011, Haas et al. 2011, Coissac et al. 2012, Boyer et al. 2016). 213
Thus, even though our results hold for DNA metabarcoding studies in terms of point mutations, 214
the formation of chimeras at high PCR cycles is a separated problem that should be considered in 215
DNA metabarcoding studies. 216
Overall, our results show that increasing the number of PCR cycles above routine levels 217
during library preparation is not risky for high-throughput DNA barcoding studies, in terms of the 218
amount of point mutations produced by polymerase errors even when a non-proofreading enzyme 219
is used. Therefore, this strategy can be safely followed with little amounts of input DNA or when 220
there are mismatches in the primer annealing regions which make the PCRs inefficient. 221
222
Acknowledgements 223
We thank Fernando Fernández-Álvarez for letting us analyse the cephalopod samples, which 224
belong to the research project CALOCEAN-2 (AGL2012-39077), funded by the Ministerio de 225
Economía y Competitividad (Spain). This work was supported by the Ministerio de Economía y 226
Competitividad (Spain) with a Ramón y Cajal research contract RYC-2009-03967 to RJ, and two 227
research projects (CGL2011-24466, CGL2015-69650-P) to DS and RJ. JD was also supported by 228
the Ministerio de Economía y Competitividad (Spain) (SVP-2013-067939). 229
230
231
232
233
Page 11 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 12
Data Accessibility 234
The MiSeq raw data and the sequences files have been deposited in Figshare (provisional private 235
link for review: https://figshare.com/s/3b6e57b2e3e821a57a91). The R code used for the analyses 236
is available on the GitHub repository (https://github.com/Jorge-Dona/Barcoding-tools). 237
238
References 239
Abdelrhman K.F.A., Bacci G., Mancusi C., Mengoni A., Serena F., and Ugolini A. 2016. A 240
first insight into the gut microbiota of the sea turtle Caretta caretta. Front. Microbiol., 241
7: 1060. 242
Arezi B., Xing,W., Sorge J. A., and Hogrefe H.H. 2003. Amplification efficiency of 243
thermostable DNA polymerases. Anal. Biochem., 321: 226-235. 244
Blaalid R., Kumar S., Nilsson R.H., Abarenkov K., Kirk P.M., and Kauserud H. 2013. ITS1 245
versus ITS2 as DNA metabarcodes for fungi. Mol. Ecol. Resour., 13: 218–224. 246
Bokulich N.A., Subramanian S., Faith J.J., Gevers D., Gordon J.I., Knight R., Mills D.A., 247
and Caporaso J.G. 2013. Quality-filtering vastly improves diversity estimates from 248
Illumina amplicon sequencing. Nat. Methods, 10: 57–59. 249
Bolger A.M., Lohse M., and Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina 250
Sequence Data. Bioinformatics: btu170. 251
Boyer F., Mercier C., Bonin A., Le Bras Y., Taberlet P., and Coissac E. 2016. obitools: a 252
unix-inspired software package for DNA metabarcoding. Mol. Ecol. Resour., 16: 253
176–182. 254
Page 12 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 13
Brandariz-Fontes C., Camacho-Sánchez M., Vilà C., Vega-Pla J.L., Rico C., and Leonard 255
J.A. 2015. Effect of the enzyme and PCR conditions on the quality of high-256
throughput DNA sequencing results. Sci. Rep., 5: 8056. 257
Caporaso J.G., Kuczynski J., Stombaugh J., Bittinger K., Bushman F.D., Costello E.K., 258
Fierer N., Peña A.G., Goodrich J.K., Gordon J.I., Huttley G.A., Kelley S.T., Knights 259
D., Koenig J.E., Ley R.E., Lozupone C.A., McDonald D., Muegge B.D., Pirrung M., 260
Reeder J., Sevinsky J.R., Turnbaugh P.J., Walters W.A., Widmann J., Yatsunenko T., 261
Zaneveld J., and Knight R. 2010. QIIME allows analysis of high-throughput 262
community sequencing data. Nat. Methods, 7: 335–336. 263
Carew M.E., Metzeling L., StClair R., and Hoffmann A.A. 2016. Detecting invertebrate 264
species in archived collections using next generation sequencing. Mol. Ecol. Resour. 265
DOI: 10.1111/1755-0998.12644 266
Casbon J.A., Osborne R.J., Brenner S., and Lichtenstein C.P. 2011. A method for counting 267
PCR template molecules with application to next-generation sequencing. Nucleic 268
Acids. Res., 39: e81. 269
Coissac E., Riaz T., and Puillandre N. 2012. Bioinformatic challenges for DNA 270
metabarcoding of plants and animals. Mol. Ecol., 21: 1834–1847. 271
Cha R.S., and Thilly W.G. 1993. Specificity, efficiency, and fidelity of PCR. Genome Res., 272
3: 18–29. 273
Cruaud P., Rasplus J.Y., Rodriguez L.J., and Cruaud, A. 2017. High-throughput sequencing 274
of multiple amplicons for barcoding and integrative taxonomy. Sci. Rep., 7: 41948. 275
Page 13 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 14
DeAngelis M.M., Wang D.G., and Hawkins T.L. 1995. Solid-phase reversible 276
immobilization for the isolation of PCR products. Nucleic Acids. Res., 23: 4742–277
4743. 278
Diaz-Real J., Serrano D., Píriz A., and Jovani R. 2015. NGS metabarcoding proves 279
successful for quantitative assessment of symbiont abundance: the case of feather 280
mites on birds. Exp. Appl. Acarol., 67: 209–218. 281
Doña J., Diaz-Real J., Mironov S., Bazaga P., Serrano D., and Jovani R. 2015. DNA 282
barcoding and minibarcoding as a powerful tool for feather mite studies. Mol. Ecol. 283
Resour., 15: 1216–1225. 284
Edgar R.C. 2013. UPARSE: highly accurate OTU sequences from microbial amplicon 285
reads. Nat. Methods, 10: 996–998. 286
Edgar R.C., Haas B.J., Clemente J.C., Quince C., and Knight R. 2011. UCHIME improves 287
sensitivity and speed of chimera detection. Bioinfomatics, 27: 2194–2200. 288
Edgar R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high 289
throughput. Nucleic Acids. Res., 32: 1792–1797. 290
Ellis R.J., Bruce K.D., Jenkins C., Stothard J.R., Ajarova L., Mugisha L., and Viney M.E. 291
2013. Comparison of the distal gut microbiota from people and animals in Africa. 292
PLOS ONE, 8: e54783. 293
Epp L.S., Boessenkool S., Bellemain E.P,. Haile J., Esposito A., Riaz T., Erséus C., 294
Gusarov V.I., Edwards M.E., Johnsen A., Stenøien H.K., Hassel K., Kauserud H., 295
Yoccoz N.G., Bråthen K.A., Willerslev E., Taberlet P., Coissac E., and Brochmann C. 296
Page 14 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 15
2012. New environmental metabarcodes for analysing soil DNA: potential for 297
studying past and present ecosystems. Mol. Ecol., 21: 1821–1833. 298
Esling P., Lejzerowicz F., and Pawlowski J. 2015. Accurate multiplexing and filtering for 299
high-throughput amplicon-sequencing. Nucleic Acids. Res., 43: 2513–2524. 300
Folmer O., Black M., Hoeh W., Lutz R., and Vrijenhoek R. 1994. DNA primers for 301
amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan 302
invertebrates. Mol. Mar. Biol. Biotechnol., 3: 294–299. 303
Geisen S., Laros I., Vizcaíno A., Bonkowski M., and de Groot G.A. 2015. Not all are free-304
living: high-throughput DNA metabarcoding reveals a diverse community of protists 305
parasitizing soil metazoa. Mol. Ecol., 24: 4556–4569. 306
Haas B.J., Gevers D., Earl A.M., Feldgarden M., Ward D.V., Giannoukos G., Ciulla D., 307
Tabbaa D., Highlander S.K., Sodergren E., Methé B., DeSantis T.Z., The Human 308
Microbiome Consortium, Petrosino J.F., Knight R., and Birren B.W. 2011. Chimeric 309
16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR 310
amplicons. Genome Res., 21: 494–504. 311
Hajibabaei M., Smith M., Janzen D.H., Rodriguez J.J., Whitfield J.B., and Hebert P.D. 312
2006. A minimalist barcode can identify a specimen whose DNA is degraded. Mol. 313
Ecol. Notes, 6: 959–964. 314
Hebert P.D.N., Cywinska A., Ball S.L., and deWaard J.R. 2003. Biological identifications 315
through DNA barcodes. P. Roy. Soc. B-Biol. Sci., 270: 313–321. 316
Hengen P.N. 1995. Methods and reagents: fidelity of DNA polymerases for PCR. Trends 317
Biochem. Sci., 20: 324–325. 318
Page 15 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 16
Huber J.A., Mark Welch D.B., Morrison H.G., Huse S.M., Neal P.R., Butterfield D.A., and 319
Sogin M.L. 2007. Microbial population structures in the deep marine biosphere. 320
Science, 318: 97–100. 321
Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock S., Buxton S., 322
Cooper A., Markowitz S., Duran C., Thierer T., Ashton B., Meintjes P., and 323
Drummond D. 2012. Geneious Basic: an integrated and extendable desktop software 324
platform for the organization and analysis of sequence data. Bioinformatics, 28: 325
1647–1649. 326
Kress W.J., García-Robledo C., Uriarte M., and Erickson D.L. 2015. DNA barcodes for 327
ecology, evolution, and conservation. Trends Ecol. Evol., 30, 25–35. 328
Lange V., Böhme I., Hofmann J., Lang K., Sauter J., Schöne B., Paul P., Albrecht V., 329
Andreas J.M., Baier D.M., Nething J., Ehninger U., Schwarzelt C., Pingel J., 330
Ehninger G., and Schmidt A.H. 2014. Cost-efficient high-throughput HLA typing by 331
MiSeq amplicon sequencing. BMC Genomics, 15: 63. 332
Oliver A.K., Brown S.P., Callaham Jr. M.A., and Jumpponen A. 2015. Polymerase matters: 333
non-proofreading enzymes inflate fungal community richness estimates by up to 15 334
%. Fungal Ecol., 15: 86–89. 335
Paradis E., Claude J., and Strimmer K. 2004. APE: analyses of phylogenetics and evolution 336
in R language. Bioinformatics, 20: 289–290. 337
Prosser, S.W., deWaard, J.R., Miller, S.E., and Hebert, P.D. 2016. DNA barcodes from 338
century]old type specimens using next-generation sequencing. Mol. Ecol. Resour., 339
16: 487–497. 340
Page 16 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 17
Qiu X., Wu L., Huang H., McDonel P.E., Palumbo A.V., Tiedje J.M., and Zhou J. 2001. 341
Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16S 342
rRNA gene-based cloning. Appl. Environ. Microbiol., 67: 880–887. 343
R Core Team 2016. R: A language and environment for statistical computing. R Foundation 344
for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. 345
Schirmer M., Ijaz U.Z., D'Amore R., Hall N., Sloan W.T., and Quince C 2015. Insight into 346
biases and sequencing errors for amplicon sequencing with the Illumina MiSeq 347
platform. Nucleic Acids. Res., 43: e37. 348
Schmidt P.-A., Bálint M., Greshake B., Bandow C., Römbke J., and Schmitt I 2013. 349
Illumina metabarcoding of a soil fungal community. Soil Biology and Biochemistry, 350
65: 128–132. 351
Schnell I.B., Bohmann K., and Gilbert M.T. 2015. Tag jumps illuminated - reducing 352
sequence-to-sample misidentifications in metabarcoding studies. Mol. Ecol. Resour., 353
15: 1289–1303. 354
Shokralla S., Gibson J.F., Nikbakht H., Janzen D.H., Hallwachs W., and Hajibabaei, M. 355
2014. Next-generation DNA barcoding: using next]generation sequencing to 356
enhance and accelerate DNA barcode capture from single specimens. Mol. Ecol. 357
Resour., 14: 892–901. 358
Shokralla S., Porter T.M., Gibson J.F., Dobosz R., Janzen D.H., Hallwachs W., Golding 359
G.B., and Hajibabaei M. 2015. Massively parallel multiplex DNA sequencing for 360
specimen identification using an Illumina MiSeq platform. Sci. Rep., 5: 9687. 361
Page 17 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 18
Smith D.P., and Peay K.G. 2014. Sequence depth, not PCR replication, improves ecological 362
inference from next generation DNA sequencing. PLOS ONE, 9: e90234. 363
Taberlet P., Coissac E., Pompanon F., Brochmann C., and Willerslev A. 2012. Towards 364
next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol., 21: 365
2045–2050. 366
Thompson J.R., Marcelino L.A., and Polz M.F. 2002. Heteroduplexes in mixed-template 367
amplifications: formation, consequence and elimination by ‘reconditioning PCR’. 368
Nucleic Acids. Res., 30: 2083–2088. 369
Toju, H. 2015. High-throughput DNA barcoding for ecological network studies. Popul. 370
Ecol., 57: 37–51. 371
Truett G.E., Heeger P., Mynatt R.L., Truett A.A., Walker J.A., and Warman M.L. 2000. 372
Preparation of PCR-quality mouse genomic DNA with hot sodium hydroxide and tris 373
(HotSHOT). BioTechniques, 29: 52–54. 374
Wang G.C., and Wang Y. 1996. The frequency of chimeric molecules as a consequence of 375
PCR co-amplification of 16S rRNA genes from different bacterial species. 376
Microbiology, 142: 1107–1114. 377
378
Page 18 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 19
Table 1. Percentage of reads with 0, 1, 2, or 3 mutations relative to the reference sequence. 379
No sequence with four or more mutations was found. 380
381 Library ID 0 1 2 3 Number of reads
94.055 5.772 0.167 0.004 22,10594.169 5.607 0.222 0 14,40893.409 6.37 0.198 0.022 22,63795.019 4.839 0.14 0 10,642
CEP007 (Bathypolypus sponsalis)CEP016 (Ancistroteuthis lichtensteini)
CEP023 (Todaropsis eblanae)SEP006 (Sepietta oweniana)
Page 19 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 20
Fig. 1. Schematic representation of the primers used for PCR1 and PCR2 (see main text). The 382
positions of the Illumina adapters, indices, and sequencing primers are also shown. Note that 383
primers are not drawn to scale. 384
385
Page 20 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 21
Fig. 2. From each cephalopod sample, five different high-throughput DNA barcoding libraries 386
were constructed and sequenced in the Illumina MiSeq platform. In each of these five libraries 387
the number of PCR cycles during PCR1 was different (35, 40, 45, 50, and 55 cycles). 388
Page 21 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Vierna et al. 22
Fig. 3. Number of mutations relative to the reference sequence observed in each PCR treatment. 389
(a) Bathypolypus sponsalis. (b) Ancistroteuthis lichtensteini. (c) Todaropsis eblanae. (d) Sepietta 390
oweniana. For interpretation of the references to color in this figure legend, the reader is referred 391
to the web version of this article. 392
393
394
395
Page 22 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Page 23 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Page 24 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
0
1
2
3
4
0 1 2 3Number of mutations
Pro
port
ion
of re
ads
c)
0 1 2 3Number of mutations
d)
0
1
2
3
4
Pro
port
ion
of re
ads
a) b)
Number of PCR cycles
35
40
45
50
55
Page 25 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 1
Library ID 0 1 2 3 Number of reads
CEP007 (Bathypolypus sponsalis ) 94.055 5.772 0.167 0.004 22,105
CEP016 (Ancistroteuthis lichtensteini ) 94.169 5.607 0.222 0 14,408
CEP023 (Todaropsis eblanae ) 93.409 6.37 0.198 0.022 22,637
SEP006 (Sepietta oweniana ) 95.019 4.839 0.14 0 10,642
% of reads with n mutations
Page 1
Page 26 of 26
https://mc06.manuscriptcentral.com/genome-pubs
Genome