DNA metabarcoding: technical aspects
Pierre Taberlet
Laboratoire d'Ecologie Alpine, CNRS UMR 5553 Université Grenoble Alpes, Grenoble, France
Porto, 1-5 May 2017
Need for high throughput collection of biodiversity data
• For research • For management
NASA Earth Observing System: Terra Satellite Platform
Difficult to use satellites for identifying taxa and
collecting biodiversity data
Why not using environmental DNA and the metabarcoding
approach?
Technical objectives
• Set up a DNA metabarcoding method that is: – high throughput – robust – simple – cheap
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
DNA extraction from soil
• Up to 10g of soil • About 25€ per extraction • Not compatible with large-scale studies
– Because of the cost – Because of the complexity of the protocol
DNA extraction from soil
• Up to 8 kg of soil (usually 15g) • Add the same weight of saturated
phosphate buffer – for 1 liter:
• Sodium Phosphate Dibasic, 14.7g • Sodium Phosphate Monobasic, 1.97g
• Mix 15 minutes • Finish the extraction with the
Macherey-Nagel Nucleospin® Soil kit
DNA extraction from soil
15 min in Phosphate buffer
Sampling in the field
within a few hours
DNA extraction from soil
Short communication
Extracellular DNA extraction is a fast, cheap and reliable alternative formulti-taxa surveys based on soil DNA
Lucie Zinger a, *, J!erome Chave a, Eric Coissac b, c, Amaia Iribar a, Eliane Louisanna d,Sophie Manzi a, Vincent Schilling a, Heidy Schimann d, Guilhem Sommeria-Klein a,Pierre Taberlet b, c
a Universit!e Toulouse 3 Paul Sabatier, CNRS, ENFA, UMR 5174 EDB, F-31062 Toulouse, Franceb CNRS, Laboratoire d'Ecologie Alpine (LECA), 38000 Grenoble, Francec Univ. Grenoble Alpes, Laboratoire d'Ecologie Alpine (LECA), 38000 Grenoble, Franced INRA UMR ECOFOG, F-97310 Kourou, French Guiana
a r t i c l e i n f o
Article history:Received 22 October 2015Received in revised form23 December 2015Accepted 15 January 2016Available online 2 February 2016
Keywords:DNA metabarcodingDNA extraction protocolTropical forestMulti-taxa biodiversity
a b s t r a c t
DNA metabarcoding on soil samples is increasingly used for large-scale and multi-taxa biodiversitystudies. However, DNA extraction may be a major bottleneck for such wide uses. It should be cost/timeeffective and allow dealing with large sample volumes so as to maximise the representativeness of bothmicro- and macro-organisms diversity. Here, we compared the performances of a fast and cheapextracellular DNA extraction protocol with a total DNA extraction method in retrieving bacterial,eukaryotic and plant diversity from tropical soil samples of ca. 10 g. The total DNA extraction protocolyielded more high-quality DNA. Yet, the extracellular DNA protocol provided similar diversity assess-ments although it presented some differences in clades relative abundance and undersampling biases.We argue that extracellular DNA is a good compromise between cost, labor, and accuracy for high-throughput DNA metabarcoding studies of soil biodiversity.
© 2016 Elsevier Ltd. All rights reserved.
Implementing efficiently soil diversity surveys across taxa andecosystem types is a formidable challenge. DNA metabarcoding is amost promising monitoring technique to meet this challenge (Biket al., 2012; Taberlet et al., 2012b; Orgiazzi et al., 2015; Thomsenand Willerslev, 2015), as it provides a high-throughput, standard-ized and cost-effective assessment of soil diversity (Bik et al., 2012;Taberlet et al., 2012b). The approach is useful for uncovering thediversity of a large array of microorganisms, such as bacteria, pro-tists or fungi (Lauber et al., 2009; Bates et al., 2013; Tedersoo et al.,2014) as well as macroorganisms such as arthropods, plants, ormammals (Andersen et al., 2012; Hiiesalu et al., 2012; Yoccoz et al.,2012; Yang et al., 2014). It also has limitations that are the focus ofactive research, such as a limited taxonomic resolution in sometaxonomic groups (Tang et al., 2012; Grattepanche et al., 2014) orPCR or sequencing biases (Wintzingerode et al., 1997; Huse et al.,2007; Schloss et al., 2011; Thomsen and Willerslev, 2015).
One major problem in implementing DNA metabarcoding forlarge-scale applications remains the extraction of DNA from soilsamples. Soil DNA is encapsulated within complex cell walls oradsorbed onto soil particles, and can be coextracted with variableamounts of humic substances that may inhibit PCR amplification(Wintzingerode et al., 1997). Many laboratory protocols or com-mercial kits have been developed to maximize extracted DNA pu-rity/yield and downstream PCR success, and some of them nowcomply with the ISO standard (Martin-Laurent et al., 2001;Philippot et al., 2012). However, these protocols were optimizedfor assessing microbial diversity. This might hamper PCR amplifi-cation of other components of the soil biota due to the over-whelming biomass of bacteria (Taberlet et al., 2012a,b). Also,commercial kits rely on sample sizes typically ranging from 0.25 to1 g of wet soil, which provide less consistent and representativepictures of local microbial communities than larger sample vol-umes (Ranjard et al., 2003). This sampling bias would be evenworse for targeting larger organisms, unless many replicates aretaken (Andersen et al., 2012). Finally, these kits are expensive, andthe time and facilities needed for the DNA extraction process is* Corresponding author. Laboratoire Evolution et Diversit!e Biologique, UMR
CNRS-UPS 5174, 118 route de Narbonne, 31062 Toulouse Cedex 9, France.E-mail address: [email protected] (L. Zinger).
Contents lists available at ScienceDirect
Soil Biology & Biochemistry
journal homepage: www.elsevier .com/locate/soi lbio
http://dx.doi.org/10.1016/j.soilbio.2016.01.0080038-0717/© 2016 Elsevier Ltd. All rights reserved.
Soil Biology & Biochemistry 96 (2016) 16e19
DNA extraction from water
• Two possibilities: – DNA precipitation/Centrifugation (e.g.
Ficetola et al. 2008) – Filtration
DNA precipitation/Centrifugation
• 15 ml of water • 30 ml of ethanol • 4.5 ml of Sodium Acetate (3M, pH 8) • Centrifugation (max speed) to
precipitate DNA and pellet cell remains
Filtration • 0.45µM glass fiber membranes • Filter large volume when possible (the
larger the better)
Turner et al. (2014) Methods in Ecology and Evolution, 5, 676-684.
Filtration
Filtration
Civade R, Dejean T, Valentini A et al. (2016) Spatial representativeness of environmental DNA metabarcoding signal for fish biodiversity assessment in a natural freshwater system. PloS One, 11, e0157366.
Valentini A, Taberlet P, Miaud C et al. (2016) Next-
generation monitoring of aquatic biodiversity using environmental DNA metabarcoding. Molecular Ecology, 25, 929–942.
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
© Roland Douzet/SAJF
The Roche Noire experiment
Sampling strategy in the field
PCRs
extractions
samples
plot
10m
80 soil cores per sample
● Sampling: 4 plant communities, 3 plots per community, 2 samples of 80 cores per plot ● Extraction of extracellular DNA from kilograms of soil using a phosphate buffer ● DNA amplification of the P6 loop of the chloroplast trnL (UAA) intron ● Sequencing on the 454
Dry high alpine meadows dominated by Kobresia myosuroides
Low alpine meadows dominated by Carex sempervirens
Subalpine heath dominated by Vaccinium sp.
Subalpine grasslands dominated by Festuca paniculata
"Roche Noire" experiment: projections of a between class analysis
CarexFestucaKobresiaVaccinium
Axe 1 (18.9%) Axe 2 (15.4%)
Axe 3 (13.2%)Axe 2 (15.4%)
A B
Sampling strategy (sampling on a grid)
Plot H20 of the Nouragues Field Station (CNRS, French Guiana)
100 m
Two DNA extractions per core, using 15g of soil per extraction
Spatial distribution of Bagassa guyanensis (sampling on a grid)
Plot H20 of the Nouragues Field Station (CNRS, French Guiana)
A compromise between grid sampling, and sample pooling
Last experiment in French Guiana
100 m 16 DNA extractions per plot, using 15g of soil. Each sample composed of a pooling of five cores.
What is the sampling unit?
Different sampling strategies
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
Tests with different polymerases • Non proof-reading polymerases
– Amplitaq Gold – Platinum® PCR SuperMix – TaqMan® Environmental Master Mix 2.0 – Taq DNA polymerase QIAGEN
• Proof-reading polymerases – Q5 – AccuPrime Pfx – Pfu Turbo polymerase – PfuUltra II Fusion Hs DNA polymerase
Tests with different polymerases • Non proof-reading polymerases
– Amplitaq Gold – Platinum® PCR SuperMix – TaqMan® Environmental Master Mix 2.0 – Taq DNA polymerase QIAGEN
• Proof-reading polymerases – Q5 – AccuPrime Pfx – Pfu Turbo polymerase – PfuUltra II Fusion Hs DNA polymerase
PCR errors • Results
– More artifacts with proof-reading polymerases when using complex templates
– Must have "phosphorothioate" primers when using proof-reading polymerase
phosphodiesterbind
phosphorothioatebind
PCR errors
GGGCAATCCTGAGCCAACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Taqpolymerase
GGGCAATCCTGAGCCAACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Proof-readingpolymerase
PCR errors
GGGCAATCCTGAGCCAACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Taqpolymerase
GGGCAATCCTGAGCCCATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Proof-readingpolymerase
PCR errors
GGGCAATCCTGAGCCAACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Taqpolymerase
GGGCAATCCTGAGCCGGTAGGGGTTATCGATACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Proof-readingpolymerase
PCR errors
GGGCAATCCTGAGCCAACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Taqpolymerase
GGGCAATCCTGAGCCAACATTACCCGTTAGGACTCGGCCATCCCCAATAGCTAT
Proof-readingpolymerase
Results
• Each polymerase has its own characteristics and produce results relatively different from others
• The use of the Taq polymerase (and not a proof-reading enzyme) seems to be a good compromise
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
Experiments with a known template Species Dilution Sequence
Taxus baccata 1.000000 atccgtattataggaacaataattttattttctagaaaagg
Salvia pratensis 0.500000 atcctgttttctcaaaacaaaggttcaaaaaacgaaaaaaaaaag
Populus tremula 0.250000 atcctatttttcgaaaacaaacaaaaaaacaaacaaaggttcataaagacagaataagaatacaaaag
Rumex acetosa 0.125000 ctcctcctttccaaaaggaagaataaaaaag
Carpinus betulus 0.062500 atcctgttttcccaaaacaaataaaacaaatttaagggttcataaagcgagaataaaaaag
Fraxinus excelsior 0.031250 atcctgttttcccaaaacaaaggttcagaaagaaaaaag
Picea abies 0.015625 atccggttcatggagacaatagtttcttcttttattctcctaagataggaaggg
Lonicera xylosteum 0.007813 atccagttttccgaaaacaagggtttagaaagcaaaaatcaaaaag
Abies alba 0.003906 atccggttcatagagaaaagggtttctctccttctcctaaggaaagg
Acer campestre 0.001953 atcctgttttacgagaataaaacaaagcaaacaagggttcagaaagcgagaaaggg
Briza media 0.000977 atccgtgttttgagaaaacaagggggttctcgaactagaatacaaaggaaaag
Rosa canina 0.000488 atcccgttttatgaaaacaaacaaggtttcagaaagcgagaataaataaag
Capsella bursa-pastoris 0.000244 atcctggtttacgcgaacacaccggagtttacaaagcgagaaaaaagg
Geranium robertianum 0.000122 atccttttttacgaaaataaagaggggctcacaaagcgagaatagaaaaaaag
Rhododendron ferrugineum 0.000061 atccttttttcgcaaacaaacaaagattccgaaagctaaaaaaaag
Lotus corniculatus 0.000031 atcctgctttacgaaaacaagggaaagttcagttaagaaagcgacgagaaaaatg
1 2 3 4 5
23
45
GWM−337
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
Quantitative aspects: results
1 2 3 4 5
12
34
5
GWM−334
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
Amplitaq Gold
without elongation time r = 0.88
with 1 min elongation time r = 0.96
1 2 3 4 5
23
45
GWM−337
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
01
23
45
6
GWM−343
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
Quantitative aspects: results
Amplitaq Gold
phosphorothioate primers r = 0.86
with 1 min elongation time r = 0.96
AccuPrime Pfx
Quantitative aspects: reproducibility Amplitaq Gold with 1 min elongation time
1 2 3 4 5
12
34
PCR_1
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
01
23
4
PCR_2
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
12
34
PCR_3
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
12
34
5
PCR_4
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
12
34
PCR_5
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
12
34
PCR_6
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
PCR_7
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
1 2 3 4 5
01
23
4
PCR_8
log concentration in template
log
num
ber o
f seq
uenc
e re
ads
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
The Illumina/Solexa
technology (1)
The Illumina/Solexa technology (2)
Library preparation for Illumina sequencing
• Making blunt ends (polishing) • Adding a A on the 3' end • Adaptor ligation • A few PCR cycles (up to 15)
The Illumina/Solexa technology (3)
The Illumina/Solexa technology (4)
The Illumina/Solexa technology (5)
HiSeq 2500
• Company: Illumina® • Website: www.illumina.com • Fragment length: 125 bases (2x125
paired-ends) • Number of reads per run: 8 109 • Total output per run: 1 Tb • Time per run: 6 days • Random distribution of the clusters
HiSeq 4000
• Company: Illumina® • Website: www.illumina.com • Fragment length: 150 bases
(2x150 paired-ends) • Number of reads per run: 8.6-10 billions • Total output per run: 1.3-1.5 Tb • Time per run: 3.5 days • Regular distribution of the clusters
Traditional versus next generation sequencing
samplingandDNAextrac:on
DNAamplifica:on
sequencing
results
ACGTTA
ACGTTG
ACGTTA
ACATTA
ACGCTA
tradi:onalsequencing nextgenera:onsequencing
bioinforma:cs
ACGTTA
ACGTTG
ACGTTA
ACATTA
ACGCTA
Two strategies • Sequencing adaptors included on the
5'-end of the amplification primers – but difficult to use with highly diluted
template, and number of samples limited; expensive at the primer level
• Amplification without the sequencing adaptors and subsequent library preparation as for genomic DNA – expensive at the library preparation level;
some technical difficulties for avoiding "tag-jumping"
A simple tagging system to identify the different samples
• By adding a specific sequence tag on the 5' end of the primers • 5’-NNacagcacaGGGCAATCCTGAGCCAA-3’• 5'-NNNacagcacaGGGCAATCCTGAGCCAA-3'• 5'-NNNNacagcacaGGGCAATCCTGAGCCAA-3’• In order to limit the cost of the primers, we suggest using a different tag on
each side of the PCR product • But problem of chimeras • Rare taxa difficult to identify
A simple tagging system to identify the different samples
• By adding a specific sequence tag on the 5' end of the primers • 5’-NNacagcacaGGGCAATCCTGAGCCAA-3’• 5'-NNNacagcacaGGGCAATCCTGAGCCAA-3'• 5'-NNNNacagcacaGGGCAATCCTGAGCCAA-3’• In order to limit the cost of the primers, we suggest using a different tag on
each side of the PCR product • But problem of chimeras • Rare taxa difficult to identify
A simple tagging system to identify the different samples
• By adding a specific sequence tag on the 5' end of the primers • 5’-NNacagcacaGGGCAATCCTGAGCCAA-3’• 5'-NNNacagcacaGGGCAATCCTGAGCCAA-3'• 5'-NNNNacagcacaGGGCAATCCTGAGCCAA-3’• In order to limit the cost of the primers, we suggest using a different tag on
each side of the PCR product • But problem of chimeras • Rare taxa difficult to identify
A simple tagging system to identify the different samples
• By adding a specific sequence tag on the 5' end of the primers • 5’-NNacagcacaGGGCAATCCTGAGCCAA-3’• 5'-NNNacagcacaGGGCAATCCTGAGCCAA-3'• 5'-NNNNacagcacaGGGCAATCCTGAGCCAA-3’• In order to limit the cost of the primers, we suggest using a different tag on
each side of the PCR product • But problem of chimeras • Rare taxa difficult to identify
DNA metabarcoding: technical aspects
• DNA extraction from soil or water • Sampling design • Which polymerase to use? • Quantitative aspects • How to process hundreds of samples at
once? • How to reduce the impact of chimeras
among samples
Leaking among samples during library preparation Library with 5 PCR cycles
acacacac
acagcaca
gtgtacat
tatgtcag
tagtcgca
tactatac
actagatc
gatcgcga
acacacac 113482
acagcaca 126335
gtgtacat 93809
tatgtcag 160242
tagtcgca 184574
tactatac 132409
actagatc 86878
gatcgcga 105527
(positive control with 16 plant species, 8 different PCRs)
Leaking among samples during library preparation Library with 5 PCR cycles (chimeras: 9.11%)
acacacac
acagcaca
gtgtacat
tatgtcag
tagtcgca
tactatac
actagatc
gatcgcga
acacacac 113482 2018 1113 1713 2241 1874 1375 1662
acagcaca 1737 126335 1540 2014 2900 1689 1335 2030
gtgtacat 1174 1689 93809 1676 1890 1292 1001 1514
tatgtcag 1429 1724 1427 160242 2443 1451 1164 1698
tagtcgca 2071 2950 1859 2851 184574 2104 1669 3224
tactatac 2148 1868 1263 1927 2395 132409 1755 1924
actagatc 1694 1674 1201 1565 2035 1754 86878 1913
gatcgcga 1358 1928 1132 1809 2773 1545 1392 105527
(positive control with 16 plant species, 8 different PCRs)
Leaking among samples during library preparation Library without PCR cycle (chimeras: 0.18%)
acacacac
acagcaca
gtgtacat
tatgtcag
tagtcgca
tactatac
actagatc
gatcgcga
acacacac 105624 44 21 62 26 35 21 50
acagcaca 16 59338 11 28 16 20 6 19
gtgtacat 13 8 45835 19 7 17 7 11
tatgtcag 29 18 7 219741 57 30 10 31
tagtcgca 24 19 8 38 87859 20 17 27
tactatac 26 16 13 49 30 73452 51 29
actagatc 13 8 8 20 12 17 41547 25
gatcgcga 18 16 8 21 11 13 5 50994
(positive control with 16 plant species, 8 different PCRs)
To reduce chimeras among samples
• No PCR cycle during library preparation • Use the same tag on each side of the
PCR products (but high cost for primers): very low level of chimeras
• Use a different tag on each side of the PCR products: low level of chimeras
Librairie without PCR (MetaFast protocol)
Librairie without PCR (standard protocol)
Thousands of PCR
Plate 1
Plate 2
Thank you for your attention