draft - university of toronto t-spacedraft vierna et al. 1 1 pcr cycles above routine numbers do not...

Draft

PCR cycles above routine numbers do not compromise high-

throughput DNA barcoding results

Journal: Genome

Manuscript ID gen-2017-0081.R2

Manuscript Type: Note

Date Submitted by the Author: 30-May-2017

Complete List of Authors: Vierna, J.; AllGenetics & Biology SL Doña, J.; Estacion Biologica de Donana CSIC Vizcaíno, A.; AllGenetics & Biology SL Serrano, D.; Estacion Biologica de Donana CSIC Jovani, R.; Estación Biológica de Doñana (CSIC)

Is the invited manuscript for consideration in a Special

Issue? : This submission is not invited

Keyword: COI, DNA barcoding, Illumina, library, mutations

https://mc06.manuscriptcentral.com/genome-pubs

Genome

Draft

Vierna et al. 1

PCR cycles above routine numbers do not compromise high-throughput 1

DNA barcoding results 2

3

J. Vierna1+, J. Doña2+, A. Vizcaíno1, D. Serrano3, R. Jovani2* 4

5

1AllGenetics & Biology SL. Edificio CICA. Campus de Elviña s/n. E-15008 A Coruña. Spain. 6

Tel: +34881015551. 7

2 Department of Evolutionary Ecology, Estación Biológica de Doñana (CSIC). Avenida Américo 8

Vespucio s/n. E-41092 Sevilla. Spain. Tel: +34954466700. 9

3 Department of Conservation Biology, Estación Biológica de Doñana (CSIC). Avenida Américo 10

Vespucio s/n. E-41092 Sevilla. Spain. Tel: +34954466700. 11

12

+Joint first authors. 13

*Corresponding author. 14

15

Key words: COI, DNA barcoding, Illumina, library, mutations, non-proofreading polymerase. 16

17

Running title: PCR cycles in high-throughput DNA barcoding 18

19

20

21

Page 1 of 26


Genome

Draft

Vierna et al. 2

Abstract 22

High-throughput DNA barcoding has become essential in ecology and evolution but some 23

technical questions still remain. Increasing the number of PCR cycles above routine 20-30 cycles 24

is a common practice when working with old-type specimens, with little amounts of DNA, or 25

when facing annealing issues with the primers. However, increasing the number of cycles can 26

raise the number of artificial mutations due to polymerase errors. In this work we sequenced 20 27

COI libraries in the Illumina MiSeq platform. Libraries were prepared with 40, 45, 50, 55, and 60 28

PCR cycles from four individuals belonging to four species of four genera of cephalopods. We 29

found no relationship between the number of PCR cycles and the number of mutations despite 30

using a nonproofreading polymerase. Moreover, even when using a high number of PCR cycles 31

the resulting number of mutations was low enough not to be an issue in the context of high-32

throughput DNA barcoding (but may still remain an issue in DNA metabarcoding due to chimera 33

formation). We conclude that the common practice of increasing the number of PCR cycles 34

should not negatively impact the outcome of a high-throughput DNA barcoding study in terms of 35

the occurrence of point mutations. 36

Page 2 of 26


Genome

Draft

Vierna et al. 3

Introduction 37

High-throughput DNA barcoding (for single specimens; Shokralla et al. 2014, 2015; Toju 2015), 38

as well as similar methods such as DNA metabarcoding (for mixed species samples; Taberlet et 39

al. 2012) or amplicon metagenomics, combine DNA-based species identification using 40

standardised markers (DNA barcoding, Hebert et al. 2003) with the power of high-throughput 41

sequencing (HTS). These methods are powerful tools in life sciences research (Taberlet et al. 42

2012; Kress et al. 2015; Toju 2015), from studying century-old type specimens (Prosser et al. 43

2015), to assessing species composition of gut microbiota (Abdelrhman et al. 2016) from mixed 44

samples. 45

Here we focus on high-throughput DNA barcoding. This methodology overcomes some 46

of the problems which currently limit DNA barcoding, such as the high DNA template 47

concentration required for Sanger sequencing and the co-amplification of other DNA templates 48

due to intrasample contamination, Wolbachia infection, gut contents, heteroplasmy, and 49

pseudogenes. Moreover, high-throughput DNA barcoding reduces both per specimen costs and 50

labour time by nearly 80 %, thus allowing to be scaled up to deal with large-scale biodiversity 51

monitoring projects (Shokralla et al. 2015, Cruaud et al. 2017). 52

However, even though high-throughput DNA barcoding is a promising method, some 53

technical issues require further study. For example, some authors have explored the impact of the 54

sequencing platform (Smith & Peay 2014), the polymerase used (Oliver et al. 2015; Brandariz-55

Fontes et al. 2015), the DNA barcode length (Hajibabaei et al. 2006; Doña et al. 2015), the 56

library preparation method (Schirmer et al. 2015), the primers (Schirmer et al. 2015), the 57

annealing temperature (Schmidt et al. 2013), or the phenomenon known as mistagging (Schnell et 58

Page 3 of 26


Genome

Draft

Vierna et al. 4

al. 2015; Esling et al. 2015) in DNA metabarcoding or amplicon sequencing. Recently, Geisen et 59

al. (2015), and Díaz-Real et al. (2015) studied to what extent DNA metabarcoding produced 60

quantitative (and not only qualitative), and reliable results in two groups of symbionts. Finally, 61

several other papers have dealt with some of these issues through bioinformatic analysis of the 62

HTS reads (Caporaso et al. 2010; Coissac et al. 2012; Edgar 2013; Bokulich et al. 2013; Boyer et 63

al. 2016). 64

Here we focused on the number of PCR cycles used for library preparation. This is a 65

technical issue that can potentially impact the biological conclusions of high-throughput DNA 66

barcoding projects, but that has not yet been studied in detail. Increasing the number of PCR 67

cycles above the normal 20-35 cycles (e.g. Shokralla et al. 2014, 2015; Carew et al. 2016) is a 68

common practice: for example, when working with old type specimens (Prosser et al. 2015), with 69

small amounts of input DNA or when the PCR is inefficient (e.g. Blaalid et al. 2013; Ellis et al. 70

2013; Carew et al. 2016). However, a large number of PCR cycles may entail the risk of 71

increasing the number of artificial mutations on the output sequencing reads because of DNA 72

polymerase errors and the amplification of these errors in subsequent PCR cycles (Cha et al. 73

1993; Hengen 1995; Brandariz-Fontes et al. 2015). This is a potential major problem for high-74

throughput DNA barcoding because it can eventually distort, among others, genetic threshold-75

based species delimitation. Yet, to our knowledge, how these extra cycles affect DNA barcoding 76

results has never been investigated. 77

To explore the consequences of the number of PCR cycles upon the number of artificial 78

mutations we extracted DNA from four different individuals belonging to four cephalopod 79

species. From each of the four DNA samples, we prepared five high-throughput DNA barcoding 80

Page 4 of 26


Genome

Draft

Vierna et al. 5

libraries with different number of PCR cycles: from 40, i.e. roughly 20 cycles higher than regular 81

numbers, to 60, as done commonly when dealing with problematic samples. After sequencing the 82

20 libraries using the Illumina MiSeq platform, we studied the relationship between the number 83

of PCR cycles and the number of mutations present in the MiSeq reads. Our results show that, for 84

a number of cycles between 40 and 60, there is no relationship between the number of PCR 85

cycles and the number of mutations, the number of reads with mutations being very low. 86

Therefore, we conclude that a number of PCR cycles as high as 60 does not compromise the 87

success of a high-throughput DNA barcoding project in terms of the occurrence of point 88

mutations. 89

90

Materials and methods 91

Four ethanol-preserved tissues obtained from different cephalopod species belonging to the 92

orders Octopoda, Oegopsida, and Sepiida were analysed (see sample IDs and cephalopod species 93

in Table 1). Species were identified according to morphology and DNA barcoding (Fernando 94

Fernández-Álvarez, personal communication). The genetic p-distances between the selected 95

individuals were between 80.1 and 85.7 for the cytochrome c oxidase subunit I gene (COI) used 96

in this study. 97

Total DNA was extracted from each individual using the NZYTissue gDNA Isolation Kit 98

(NZYTech). DNAs were quantified with the Qubit dsDNA HS Assay Kit (ThermoFisher 99

Scientific) and used as input for the preparation of the libraries. 100

We followed a standard Illumina library preparation protocol. In brief, we amplified the 101

COI region (i.e. the standard animal barcode, Hebert et al. 2003) and included the Illumina 102

Page 5 of 26


Genome

Draft

Vierna et al. 6

specific adapters and indices by following a two-step PCR approach, slightly modified from 103

Lange et al. (2014). For the sake of clarity, we refer to these PCRs as PCR1 and PCR2. 104

PCR1 primers were LCO1490 and HCO2198 (Folmer et al. 1994), which proved 105

successful in a previous study in which the same specimens were DNA barcoded (Fernando 106

Fernández-Álvarez et al., personal communication). Oligonucleotide tails bearing the Illumina 107

sequencing primers were attached to the 5’ ends of primers LCO1490 and HCO2198. PCR2 was 108

carried out with tailed primers which bear the indices and adapters and anneal to the Illumina 109

sequencing primers (see Fig. 1 for a schematic representation of the binding process). 110

PCR1 was carried out using 25 ng of total DNA in a final volume of 25 µL containing 111

6.50 µL of Supreme NZYTaq Green PCR Master Mix (NZYTech) (nonproofreading polymerase; 112

error rate of 1x10-5 according to the manufacturer), 0.5 µM of each primer, and PCR-grade water 113

up to 25 µL. The thermal cycling conditions were as follows: an initial denaturation step at 95 ºC 114

for 5 min, followed by 35, 40, 45, 50, or 55 cycles (see Fig. 2) of denaturation at 95 ºC for 30 s; 115

annealing at 53 ºC for 30 s; extension at 72 ºC for 45 s; and a final extension step at 72 ºC for 10 116

min. The products of PCR1 were purified using the SPRI method (DeAngelis et al. 1995), with 117

Mag-Bind RXNPure Plus magnetic beads (Omega Biotek). The purified products were loaded in 118

a 1 % agarose gel stained with GreenSafe (NZYTech), and visualised under UV light. 119

PCR2 was carried out using 2.5 µL of the purified PCR1 products, and the same 120

conditions as for PCR1 except for the number of cycles, which was set to five (Fig. 2) and the 121

annealing temperature (60 ºC). The products obtained were purified following the SPRI method 122

as indicated above. Then, the purified products were loaded in a 1 % agarose gel stained with 123

Page 6 of 26


Genome

Draft

Vierna et al. 7

GreenSafe (NZYTech), and visualised under UV light. All samples yielded libraries of the 124

expected size. 125

Libraries were quantified using the Qubit dsDNA HS Assay Kit (Life Technologies), and 126

pooled in equimolar amounts. The pool was sequenced in a fraction of a 600-cycle run (MiSeq 127

Reagent Kit v3; PE300) of an Illumina MiSeq sequencer along with a PhiX library used to 128

increase sequence diversity of the overall library, in Macrogen (Seoul, Korea). 129

FASTQ files were demultiplexed using RTA 1.18.54 (Illumina) and checked with FastQC 130

0.11.3 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then, they were quality-131

trimmed using very conservative parameters in Trimmomatic 0.36 (Bolger et al. 2014) with the 132

option SLIDINGWINDOW:1:30. SLIDINGWINDOW starts scanning at the 5' end and clips the 133

read once the average quality within the window falls below a threshold (Trimmomatic Manual 134

0.32). We set the size of the window to 1 and the quality threshold to 30 (Phred Quality Score). 135

Therefore, when the quality of a single nucleotide fell below a Phred Quality Score of 30, the 136

read was clipped from this position to the 3' end. We used these very conservative parameters in 137

order to make sure that the mutations observed in the sequencing results were due to PCR errors 138

and not to sequencing errors. The quality of the resulting files was checked again with FastQC. 139

Quality-trimmed FASTQ files were imported into Geneious 8.1.6 140

(http://www.geneious.com, Kearse et al. 2012). Each pair of R1 and R2 files were set as paired 141

reads in order to improve the mapping. A map-to-reference analysis was carried out with the 142

Geneious mapper using relaxed parameters (maximum number of mismatches per read, 25 %; 143

minimum overlap identity, 80 %) to allow potentially mutated reads to map. The DNA barcode 144

sequences from the four cephalopod specimens were set as references (DDBJ/EMBL/GenBank 145

Page 7 of 26


Genome

Draft

Vierna et al. 8

accession numbers KX078469 - KX078472). The results of the map-to-reference analysis were 146

inspected manually to verify that the reads of each library mapped to the correct reference 147

sequence. We obtained 20 assembly files corresponding to the four species by the five PCR 148

treatments. 149

Regions including the first 50 nucleotides of the mapped R1 and R2 reads (starting 150

immediately after the primer annealing region) were aligned in each assembly with Muscle 151

(Edgar 2004) as implemented in Geneious 8.1.6. We selected these two 50-nucleotide regions 152

because such read length accumulated the maximum number of reads after passing the quality 153

threshold (see above); using larger regions would have reduced the sample size and, therefore, the 154

statistical power of the analysis. Reads were trimmed to the same length to simplify later 155

bioinformatic analyses. 156

For each alignment file, we calculated the number of mutations per read by comparing 157

every read against the consensus sequence. The consensus sequences obtained from the FASTA 158

files of the same species were identical between them (regardless of the number of PCR cycles) 159

and they were also identical to the corresponding COI sequences available in 160

DDBJ/EMBL/GenBank. For this, we used a custom developed R function (R Development Core 161

Team 2016) to calculate the number of mutations by multiplying the pairwise genetic p-distance 162

by the total length of our reads. The function treated insertions and deletions (indels) as single 163

mutational steps and the genetic p-distance was calculated with the dist.dna function (raw model) 164

from the ape 3.4 R package (Paradis et al. 2004). Then, we ran a Poisson generalised linear 165

mixed model (GLMM) on the entire resulting data set (glmer function from package lme4 1.1-12; 166

Bates et al. 2015). We considered the number of mutations as the response variable, the number 167

Page 8 of 26


Genome

Draft

Vierna et al. 9

of cycles as the predictor variable, and the species as a random factor. We confirmed assumptions 168

underlying GLMMs by exploring regression residuals for normality against a Q-Q plot. 169

Finally, to make sure that the PCR1 reaction was still functioning after 55 cycles (i.e. that 170

the emergence of new artificial mutations was still possible), qPCRs were performed in all four 171

samples with the same parameters as in PCR1, but with 60 cycles in order to cover the whole 172

range of our experiment. The resulting fluorescence vs number of cycles plots were visually 173

analysed, confirming that the reaction was still taking place after 55 cycles. 174

175

Results 176

Due to the stringent quality-filtering, only 2.26 % of the raw reads were used for the statistical 177

analyses (see Table S1). The average quality of both the raw and quality-trimmed reads, as 178

measured with FastQC, is available in Fig. S1. 179

We detected mutations in 4,176 out of the 69,792 reads analysed (i.e. 5.98 %) which passed the 180

quality-filtering step, mapped to the correct reference sequence, and were located within the 181

50-nucleotide stretches after the primer annealing regions. 182

The number of mutations was consistent across species and the maximum number of mutations 183

per read was three along different treatments (Fig. 3 and Table 1). Accordingly, we found no 184

effect of the number of cycles on the number of mutations (Fig. 3; Slope ± SE = 0.0002 ± 0.0024, 185

Z = 0.096, P = 0.923). 186

187

188

189

Page 9 of 26


Genome

Draft

Vierna et al. 10

Discussion 190

In this work we investigated whether increasing the number of PCR cycles during library 191

preparation produces a higher number of mutations which could eventually impact the outcome 192

of a high-throughput DNA barcoding study. We demonstrated that even for a high number of 193

cycles (60, i.e. up to 55 cycles for PCR1 and five additional cycles for PCR2) the number of 194

reads with mutations remained very low despite using a non-proofreading enzyme and despite the 195

potential occurrence of heteroplasmy (which would increase the number of mutated positions 196

when compared to the reference sequence). However, we only analysed two regions of 50 197

nucleotides each from the COI animal DNA barcode, whereas different genomic regions may 198

impose different error rates to DNA polymerase (e.g. Arezi et al. 2003). Nevertheless, the lack of 199

effect we found in these regions with high sequence quality by experimentally increasing the 200

number of PCR cycles indicates that PCR cycles might have negligeable impacts on point 201

mutations and subsequent taxonomic assignment. 202

Some DNA metabarcoding-specific technical issues can arise by an increase in the 203

number of PCR cycles, and thus require further study. For instance, chimeras are hybrid 204

amplicons which can be formed during a PCR when an aborted extension product from an earlier 205

cycle functions as a primer in a subsequent PCR cycle (Haas et al. 2011). Chimeras inflate 206

diversity in an artificial manner and should be carefully taken into account. In this work, 207

chimeras were not an issue because we prepared our libraries using DNA from individual 208

specimens (i.e. high-throughput DNA barcoding libraries). However, the formation of chimeras 209

has been found to be correlated with the number of PCR cycles and to the consumption of the 210

primers (Wang & Wang 1996; Qiu et al. 2001; Thompson et al. 2002). Fortunately, several 211

Page 10 of 26


Genome

Draft

Vierna et al. 11

bioinformatic tools have been developed to deal with chimeras and thus their impact can be 212

greatly reduced (Edgar et al. 2011, Haas et al. 2011, Coissac et al. 2012, Boyer et al. 2016). 213

Thus, even though our results hold for DNA metabarcoding studies in terms of point mutations, 214

the formation of chimeras at high PCR cycles is a separated problem that should be considered in 215

DNA metabarcoding studies. 216

Overall, our results show that increasing the number of PCR cycles above routine levels 217

during library preparation is not risky for high-throughput DNA barcoding studies, in terms of the 218

amount of point mutations produced by polymerase errors even when a non-proofreading enzyme 219

is used. Therefore, this strategy can be safely followed with little amounts of input DNA or when 220

there are mismatches in the primer annealing regions which make the PCRs inefficient. 221

222

Acknowledgements 223

We thank Fernando Fernández-Álvarez for letting us analyse the cephalopod samples, which 224

belong to the research project CALOCEAN-2 (AGL2012-39077), funded by the Ministerio de 225

Economía y Competitividad (Spain). This work was supported by the Ministerio de Economía y 226

Competitividad (Spain) with a Ramón y Cajal research contract RYC-2009-03967 to RJ, and two 227

research projects (CGL2011-24466, CGL2015-69650-P) to DS and RJ. JD was also supported by 228

the Ministerio de Economía y Competitividad (Spain) (SVP-2013-067939). 229

230

231

232

233

Page 11 of 26


Genome

Draft

Vierna et al. 12

Data Accessibility 234

The MiSeq raw data and the sequences files have been deposited in Figshare (provisional private 235

link for review: https://figshare.com/s/3b6e57b2e3e821a57a91). The R code used for the analyses 236

is available on the GitHub repository (https://github.com/Jorge-Dona/Barcoding-tools). 237

238

References 239

Abdelrhman K.F.A., Bacci G., Mancusi C., Mengoni A., Serena F., and Ugolini A. 2016. A 240

first insight into the gut microbiota of the sea turtle Caretta caretta. Front. Microbiol., 241

7: 1060. 242

Arezi B., Xing,W., Sorge J. A., and Hogrefe H.H. 2003. Amplification efficiency of 243

thermostable DNA polymerases. Anal. Biochem., 321: 226-235. 244

Blaalid R., Kumar S., Nilsson R.H., Abarenkov K., Kirk P.M., and Kauserud H. 2013. ITS1 245

versus ITS2 as DNA metabarcodes for fungi. Mol. Ecol. Resour., 13: 218–224. 246

Bokulich N.A., Subramanian S., Faith J.J., Gevers D., Gordon J.I., Knight R., Mills D.A., 247

and Caporaso J.G. 2013. Quality-filtering vastly improves diversity estimates from 248

Illumina amplicon sequencing. Nat. Methods, 10: 57–59. 249

Bolger A.M., Lohse M., and Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina 250

Sequence Data. Bioinformatics: btu170. 251

Boyer F., Mercier C., Bonin A., Le Bras Y., Taberlet P., and Coissac E. 2016. obitools: a 252

unix-inspired software package for DNA metabarcoding. Mol. Ecol. Resour., 16: 253

176–182. 254

Page 12 of 26


Genome

Draft

Vierna et al. 13

Brandariz-Fontes C., Camacho-Sánchez M., Vilà C., Vega-Pla J.L., Rico C., and Leonard 255

J.A. 2015. Effect of the enzyme and PCR conditions on the quality of high-256

throughput DNA sequencing results. Sci. Rep., 5: 8056. 257

Caporaso J.G., Kuczynski J., Stombaugh J., Bittinger K., Bushman F.D., Costello E.K., 258

Fierer N., Peña A.G., Goodrich J.K., Gordon J.I., Huttley G.A., Kelley S.T., Knights 259

D., Koenig J.E., Ley R.E., Lozupone C.A., McDonald D., Muegge B.D., Pirrung M., 260

Reeder J., Sevinsky J.R., Turnbaugh P.J., Walters W.A., Widmann J., Yatsunenko T., 261

Zaneveld J., and Knight R. 2010. QIIME allows analysis of high-throughput 262

community sequencing data. Nat. Methods, 7: 335–336. 263

Carew M.E., Metzeling L., StClair R., and Hoffmann A.A. 2016. Detecting invertebrate 264

species in archived collections using next generation sequencing. Mol. Ecol. Resour. 265

DOI: 10.1111/1755-0998.12644 266

Casbon J.A., Osborne R.J., Brenner S., and Lichtenstein C.P. 2011. A method for counting 267

PCR template molecules with application to next-generation sequencing. Nucleic 268

Acids. Res., 39: e81. 269

Coissac E., Riaz T., and Puillandre N. 2012. Bioinformatic challenges for DNA 270

metabarcoding of plants and animals. Mol. Ecol., 21: 1834–1847. 271

Cha R.S., and Thilly W.G. 1993. Specificity, efficiency, and fidelity of PCR. Genome Res., 272

3: 18–29. 273

Cruaud P., Rasplus J.Y., Rodriguez L.J., and Cruaud, A. 2017. High-throughput sequencing 274

of multiple amplicons for barcoding and integrative taxonomy. Sci. Rep., 7: 41948. 275

Page 13 of 26


Genome

Draft

Vierna et al. 14

DeAngelis M.M., Wang D.G., and Hawkins T.L. 1995. Solid-phase reversible 276

immobilization for the isolation of PCR products. Nucleic Acids. Res., 23: 4742–277

4743. 278

Diaz-Real J., Serrano D., Píriz A., and Jovani R. 2015. NGS metabarcoding proves 279

successful for quantitative assessment of symbiont abundance: the case of feather 280

mites on birds. Exp. Appl. Acarol., 67: 209–218. 281

Doña J., Diaz-Real J., Mironov S., Bazaga P., Serrano D., and Jovani R. 2015. DNA 282

barcoding and minibarcoding as a powerful tool for feather mite studies. Mol. Ecol. 283

Resour., 15: 1216–1225. 284

Edgar R.C. 2013. UPARSE: highly accurate OTU sequences from microbial amplicon 285

reads. Nat. Methods, 10: 996–998. 286

Edgar R.C., Haas B.J., Clemente J.C., Quince C., and Knight R. 2011. UCHIME improves 287

sensitivity and speed of chimera detection. Bioinfomatics, 27: 2194–2200. 288

Edgar R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high 289

throughput. Nucleic Acids. Res., 32: 1792–1797. 290

Ellis R.J., Bruce K.D., Jenkins C., Stothard J.R., Ajarova L., Mugisha L., and Viney M.E. 291

2013. Comparison of the distal gut microbiota from people and animals in Africa. 292

PLOS ONE, 8: e54783. 293

Epp L.S., Boessenkool S., Bellemain E.P,. Haile J., Esposito A., Riaz T., Erséus C., 294

Gusarov V.I., Edwards M.E., Johnsen A., Stenøien H.K., Hassel K., Kauserud H., 295

Yoccoz N.G., Bråthen K.A., Willerslev E., Taberlet P., Coissac E., and Brochmann C. 296

Page 14 of 26


Genome

Draft

Vierna et al. 15

2012. New environmental metabarcodes for analysing soil DNA: potential for 297

studying past and present ecosystems. Mol. Ecol., 21: 1821–1833. 298

Esling P., Lejzerowicz F., and Pawlowski J. 2015. Accurate multiplexing and filtering for 299

high-throughput amplicon-sequencing. Nucleic Acids. Res., 43: 2513–2524. 300

Folmer O., Black M., Hoeh W., Lutz R., and Vrijenhoek R. 1994. DNA primers for 301

amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan 302

invertebrates. Mol. Mar. Biol. Biotechnol., 3: 294–299. 303

Geisen S., Laros I., Vizcaíno A., Bonkowski M., and de Groot G.A. 2015. Not all are free-304

living: high-throughput DNA metabarcoding reveals a diverse community of protists 305

parasitizing soil metazoa. Mol. Ecol., 24: 4556–4569. 306

Haas B.J., Gevers D., Earl A.M., Feldgarden M., Ward D.V., Giannoukos G., Ciulla D., 307

Tabbaa D., Highlander S.K., Sodergren E., Methé B., DeSantis T.Z., The Human 308

Microbiome Consortium, Petrosino J.F., Knight R., and Birren B.W. 2011. Chimeric 309

16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR 310

amplicons. Genome Res., 21: 494–504. 311

Hajibabaei M., Smith M., Janzen D.H., Rodriguez J.J., Whitfield J.B., and Hebert P.D. 312

2006. A minimalist barcode can identify a specimen whose DNA is degraded. Mol. 313

Ecol. Notes, 6: 959–964. 314

Hebert P.D.N., Cywinska A., Ball S.L., and deWaard J.R. 2003. Biological identifications 315

through DNA barcodes. P. Roy. Soc. B-Biol. Sci., 270: 313–321. 316

Hengen P.N. 1995. Methods and reagents: fidelity of DNA polymerases for PCR. Trends 317

Biochem. Sci., 20: 324–325. 318

Page 15 of 26


Genome

Draft

Vierna et al. 16

Huber J.A., Mark Welch D.B., Morrison H.G., Huse S.M., Neal P.R., Butterfield D.A., and 319

Sogin M.L. 2007. Microbial population structures in the deep marine biosphere. 320

Science, 318: 97–100. 321

Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock S., Buxton S., 322

Cooper A., Markowitz S., Duran C., Thierer T., Ashton B., Meintjes P., and 323

Drummond D. 2012. Geneious Basic: an integrated and extendable desktop software 324

platform for the organization and analysis of sequence data. Bioinformatics, 28: 325

1647–1649. 326

Kress W.J., García-Robledo C., Uriarte M., and Erickson D.L. 2015. DNA barcodes for 327

ecology, evolution, and conservation. Trends Ecol. Evol., 30, 25–35. 328

Lange V., Böhme I., Hofmann J., Lang K., Sauter J., Schöne B., Paul P., Albrecht V., 329

Andreas J.M., Baier D.M., Nething J., Ehninger U., Schwarzelt C., Pingel J., 330

Ehninger G., and Schmidt A.H. 2014. Cost-efficient high-throughput HLA typing by 331

MiSeq amplicon sequencing. BMC Genomics, 15: 63. 332

Oliver A.K., Brown S.P., Callaham Jr. M.A., and Jumpponen A. 2015. Polymerase matters: 333

non-proofreading enzymes inflate fungal community richness estimates by up to 15 334

%. Fungal Ecol., 15: 86–89. 335

Paradis E., Claude J., and Strimmer K. 2004. APE: analyses of phylogenetics and evolution 336

in R language. Bioinformatics, 20: 289–290. 337

Prosser, S.W., deWaard, J.R., Miller, S.E., and Hebert, P.D. 2016. DNA barcodes from 338

century]old type specimens using next-generation sequencing. Mol. Ecol. Resour., 339

16: 487–497. 340

Page 16 of 26


Genome

Draft

Vierna et al. 17

Qiu X., Wu L., Huang H., McDonel P.E., Palumbo A.V., Tiedje J.M., and Zhou J. 2001. 341

Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16S 342

rRNA gene-based cloning. Appl. Environ. Microbiol., 67: 880–887. 343

R Core Team 2016. R: A language and environment for statistical computing. R Foundation 344

for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. 345

Schirmer M., Ijaz U.Z., D'Amore R., Hall N., Sloan W.T., and Quince C 2015. Insight into 346

biases and sequencing errors for amplicon sequencing with the Illumina MiSeq 347

platform. Nucleic Acids. Res., 43: e37. 348

Schmidt P.-A., Bálint M., Greshake B., Bandow C., Römbke J., and Schmitt I 2013. 349

Illumina metabarcoding of a soil fungal community. Soil Biology and Biochemistry, 350

65: 128–132. 351

Schnell I.B., Bohmann K., and Gilbert M.T. 2015. Tag jumps illuminated - reducing 352

sequence-to-sample misidentifications in metabarcoding studies. Mol. Ecol. Resour., 353

15: 1289–1303. 354

Shokralla S., Gibson J.F., Nikbakht H., Janzen D.H., Hallwachs W., and Hajibabaei, M. 355

2014. Next-generation DNA barcoding: using next]generation sequencing to 356

enhance and accelerate DNA barcode capture from single specimens. Mol. Ecol. 357

Resour., 14: 892–901. 358

Shokralla S., Porter T.M., Gibson J.F., Dobosz R., Janzen D.H., Hallwachs W., Golding 359

G.B., and Hajibabaei M. 2015. Massively parallel multiplex DNA sequencing for 360

specimen identification using an Illumina MiSeq platform. Sci. Rep., 5: 9687. 361

Page 17 of 26


Genome

Draft

Vierna et al. 18

Smith D.P., and Peay K.G. 2014. Sequence depth, not PCR replication, improves ecological 362

inference from next generation DNA sequencing. PLOS ONE, 9: e90234. 363

Taberlet P., Coissac E., Pompanon F., Brochmann C., and Willerslev A. 2012. Towards 364

next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol., 21: 365

2045–2050. 366

Thompson J.R., Marcelino L.A., and Polz M.F. 2002. Heteroduplexes in mixed-template 367

amplifications: formation, consequence and elimination by ‘reconditioning PCR’. 368

Nucleic Acids. Res., 30: 2083–2088. 369

Toju, H. 2015. High-throughput DNA barcoding for ecological network studies. Popul. 370

Ecol., 57: 37–51. 371

Truett G.E., Heeger P., Mynatt R.L., Truett A.A., Walker J.A., and Warman M.L. 2000. 372

Preparation of PCR-quality mouse genomic DNA with hot sodium hydroxide and tris 373

(HotSHOT). BioTechniques, 29: 52–54. 374

Wang G.C., and Wang Y. 1996. The frequency of chimeric molecules as a consequence of 375

PCR co-amplification of 16S rRNA genes from different bacterial species. 376

Microbiology, 142: 1107–1114. 377

378

Page 18 of 26


Genome

Draft

Vierna et al. 19

Table 1. Percentage of reads with 0, 1, 2, or 3 mutations relative to the reference sequence. 379

No sequence with four or more mutations was found. 380

381 Library ID 0 1 2 3 Number of reads

94.055 5.772 0.167 0.004 22,10594.169 5.607 0.222 0 14,40893.409 6.37 0.198 0.022 22,63795.019 4.839 0.14 0 10,642

CEP007 (Bathypolypus sponsalis)CEP016 (Ancistroteuthis lichtensteini)

CEP023 (Todaropsis eblanae)SEP006 (Sepietta oweniana)

Page 19 of 26


Genome

Draft

Vierna et al. 20

Fig. 1. Schematic representation of the primers used for PCR1 and PCR2 (see main text). The 382

positions of the Illumina adapters, indices, and sequencing primers are also shown. Note that 383

primers are not drawn to scale. 384

385

Page 20 of 26


Genome

Draft

Vierna et al. 21

Fig. 2. From each cephalopod sample, five different high-throughput DNA barcoding libraries 386

were constructed and sequenced in the Illumina MiSeq platform. In each of these five libraries 387

the number of PCR cycles during PCR1 was different (35, 40, 45, 50, and 55 cycles). 388

Page 21 of 26


Genome

Draft

Vierna et al. 22

Fig. 3. Number of mutations relative to the reference sequence observed in each PCR treatment. 389

(a) Bathypolypus sponsalis. (b) Ancistroteuthis lichtensteini. (c) Todaropsis eblanae. (d) Sepietta 390

oweniana. For interpretation of the references to color in this figure legend, the reader is referred 391

to the web version of this article. 392

393

394

395

Page 22 of 26


Genome

Draft

Page 23 of 26


Genome

Draft

Page 24 of 26


Genome

Draft

0

1

2

3

4

0 1 2 3Number of mutations

Pro

port

ion

of re

ads

c)

0 1 2 3Number of mutations

d)

0

1

2

3

4

Pro

port

ion

of re

ads

a) b)

Number of PCR cycles

35

40

45

50

55

Page 25 of 26


Genome

Draft

Table 1

Library ID 0 1 2 3 Number of reads

CEP007 (Bathypolypus sponsalis ) 94.055 5.772 0.167 0.004 22,105

CEP016 (Ancistroteuthis lichtensteini ) 94.169 5.607 0.222 0 14,408

CEP023 (Todaropsis eblanae ) 93.409 6.37 0.198 0.022 22,637

SEP006 (Sepietta oweniana ) 95.019 4.839 0.14 0 10,642

% of reads with n mutations

Page 1

Page 26 of 26


Genome

draft - university of toronto t-spacedraft vierna et al. 1 1 pcr cycles above routine numbers do not...

Documents