standard land plant barcoding requires a multi loci approach? peter gasson sujeevan ratnasingham...
TRANSCRIPT
Standard land plant barcoding requires a multi loci approach?
Peter Gasson
Sujeevan R
atnasingham
Robyn Cowan
Mitochondrial DNA in land plants:
•undergoes rearrangements
•transfer of genes to nucleus
•incorporation of foreign genes
•substitution rates are VERY slow (with a few notable exceptions e.g. Plantago, Cho & al.)
COI Divergence in Eukaryotes
Eub
acte
riaC
ilia
taA
pico
mpl
exa
Din
ofla
gel
lata
Eug
leno
phyt
aK
ine
top
last
ida
Chl
oro
ara
chn
ioph
yta
An
imal
ia
Chl
oro
phyc
eae
Fu
ng
i
Xan
tho
phy
ceae
Pha
eop
hyce
ae
Eus
tigm
ato
phyc
eae
Bac
illar
ioop
hyce
aeC
ryp
top
hyta
Pry
mne
siop
hyta
Chl
oro
phyc
eae
Rho
doph
yta
La
nd
Pla
nts
20 % aa
Resolving Species Through DNA Barcoding
Partners
Instituto de Biologia UNAM,Mexico – Gerardo SalazarImperial College, UK - Timothy Barraclough
Natural History Museum, Denmark - Gitte Petersen Natural History Museum (London), UK - Mark CarineNew York Botanical Garden, USA - Kenneth Cameron
Royal Botanic Garden Edinburgh, UK - Peter Hollingsworth Royal Botanic Gardens, Kew, UK - Mark Chase
South African National Biodiversity Institute - Ferozah Conrad University of Cape Town, South Africa - Terry Hedderson
U. Estadual de Feira de Santana, Brazil - Cássio van den BergUniversidad de los Andes - Santiago Madriñán
U. of Wales Aberystwyth UK (previously University of Reading, UK) - Mike Wilkinson
Alfred P. Sloan FoundationGordon and Betty Moore Foundation
To develop a universal approach to barcoding of all landplants
• Phase 1: primer development (protein motifs); complete genome sequences; problems: ferns; 46 pairs of sister taxa from mosses, liverworts, hornworts, lycopods, ferns/fern allies, gymnosperms, angiosperms – percent PCR success & percent polymorphisms
• Phase 2: in depth trials of six markers identified in phase I on a range of well sampled taxa from across land plants
So what are the characteristics of a good barcode?
•High inter-specific, low intra-specific sequence divergence•Universal amplification/sequencing with standard primers•Technically simple to sequence•Short enough to sequence in one reaction•Easily alignable (few insertions/deletions)•Readily recoverable from museum or herbarium samples and other degraded samples
**Universal + Variable**
What sort of marker should we use?
•Mitochondrial DNA
•Plastid
•Ribosomal DNA (ITS)
•Low-copy nuclear DNA (protein coding)
•Length variable ?
•Single loci
•Multiple loci (one genomic compartment) ?
•Multiple loci (two genomic compartments) ?
Advantages of plastid DNA (hence its use in phylogenetics)
•Monomorphic (separation of different copies not required in hybrids)
•High copy number (can even be amplified from highly degraded DNA)
•Potentially highly diagnostic (in spite of its reputation to the contrary)
However, will not detect hybrids, introgression, paralogy
Coding or non-coding?
Non-coding regions:
sometimes more variable
microsatellites difficult to sequence through
numerous indels-impossible to align, length variable
cannot translate to check for pseudoproteins and to aid aligment
sometimes contain rearrangements and coding insertions
(character based identification)
Criterion for locus selection
1. Species level sequence divergence
2. Appropriate length (200-800bp)
3. Presence of conserved primer target sites
4. At least 200bp exon sequence
Our Strategy
1. Identify suitable loci on the basis of in silico screens using Nicotiana cp sequence
2. Design universal primers (sets of 4 primers/locus) using amino acid and nucleic acid sequence data
3. Perform initial screen for universality (1 primer pair)
4. Screen for sequence variation using diverse species pairs
5. Improve universality (e.g. use all primer combinations)
6. Use statistical modelling approaches to identify optimal primer sets
Standard PCR Recipe
• NH4 x1
• Mg2+ 1.5mM
• dNTPs 0.2mM
• FW test primer 1M
• RE test primer 1M
• Taq DNA polymerase 2 units
• BSA 0.1mg/ml
• Template 40ng
• Water to 20l
Results of First PCRGene
nd
hK
nd
hJ
rpoC
1
rpoB
YC
F2
accD
rpoC
2
nd
hA
YC
F9
YC
F5
matK
rpl2
2
Total success
99 88 88 84 75 73 71 70 67 57 45 23
% success
90 80 80 76 68 66 65 64 61 52 41 28
Number of Variable SitesGene
matK
(11)
YC
F5 (10)
accD (6)
rpoC2 (7)
rpoB (4)
rpoC1 (3)
YC
F9 (9)
ndhJ (2)
ndhA (8)
ndhK (1)
YC
F2 (5)
Variable sites
514 256 210 300 188 226 61 125 102 51 95
Length 814 414 394 700 475 578 163 366 328 185 423
% sites variable
63 62 53 43 40 39 37 34 31 28 22
Trial regions
Selected seven genes that represent the different levels of universality and variability. Blue= high, green = medium, yellow= low.
Gene
ndhA
YC
F9
rpoC2
accD
ndhK
YC
F2
rpoB
ndhJ
rpoC1
YC
F5
matK
Variability
Universality
Trial groups
Asterella Anastrophyllum-Barbilophozia Tortella Bryum Triquetrella Homalothecim Tortella Elaphoglossum Asplenium Equisetum Cupressus Pinus Araucaria Labordia Conostylis Dactylorhiza maculata/incarnata Mimetes Inga Hordeum Scalesia Crocus Laelia Cattleya Mormodes Deiregyne Lauraceae
Group Family Primary genera accD matK ndhJ rpoB rpoC1
Angio asterids Asteraceae Scalesia 1+3 2.1+5 1+3 1+3 1+3
Angio asterids Loganicaceae Labordia 2+4 X+5 1+4 1+3 2+4
Angio eudicots Proteaceae Mimetes 1+4 * * * 1+4
Angio magnoliids Lauraceae Lauraceae 2+4 X+5 2+4 2+3 2+4
Angio monocot Agavaceae Agave 1+4 2.1+3.2 1+4 1+4 1+4
Angio monocot Haemodoraceae Conostylis 2+4 X+5 1+3 2+3 2+4
Angio monocot Iridaceae Crocus 2+4 2.1a+5 1+3 2+3 1+3
Angio monocot Orchidaceae Aulosepalum 1+4 2.1+3.2 1+4 1+4 1+4
Angio monocot Orchidaceae Cattleya 2+4 2.1a+5 * 2+3 2+4
Angio monocot Orchidaceae Dactylorhiza 2+4 X+5 1+3 2+3 2+4
Angio monocot Orchidaceae Sophronitis 2+4 2.1a+5 1+3 1+3 2+4
Angio monocot Poaceae Hordeum Missing 2.1a+5 1+3 2+3 2+4
Angio rosids Fabaceae Inga 2+4 X+3.2 1+3 1+3 2+4
Fern Aspleniaceae Asplenium * * LP1+LP5 * *
Fern Dryopteridaceae Elaphoglossum LP1+LP4 * * * LP1+LP5
Fern ally Equisetaceae Equisetum 1+LP3 FE+RE LP1+LP4 LP1.1+LP4.3 LP1+LP5
Gymnosperm Araucariaceae Araucaria 2+4 FE+RE? 1+3 2+LP3 2+4
Gymnosperm Cupressaceae Cupressus 1+4 * 1+4 * 2+4
Gymnosperm Pinaceae Pinus 2+4 FE+RE Missing 2+LP3 2+4
Gymnosperm Zamiaceae Encephalartos 1+4 FE+RE * 2+3 1+4
Liverwort Aytoniaceae Asterella 2+4 * 1+3 * 2+4
Liverwort Lophoziaceae Anastrophyllum * * LP1+LP4 * 2+4
Moss Bryaceae Bryum * * LP1+LP4 LP1.1+LP5.2 2+4
Moss Pottiaceae Tortella * * * LP1.1+LP3.2 LP1+4
Moss Pottiaceae Triquetrella * * LP1+LP4 LP1.1+LP5.2 1+3
Moss Ptychomniaceae * 1+4 *LP1.1+LP5.3 LP1.1+LP5.3 1+4
Trial regions
Selected seven genes that represent the different levels of universality and variability. Blue= high, green = medium, yellow= low.
Gene
ndhA
YC
F9
rpoC2
accD
ndhK
YC
F2
rpoB
ndhJ
rpoC1
YC
F5
matK
Variability
Universality
Agavaceae X 22 sp.
Crocus X 9 sp.
Aulosepalum X 8 sp.(?all)
Cattleya X 30sp.(2 clades approx 43 sp.)
Dactylorhiza 15 sp. (species complex)
Sophrinitis 27 sp. (approx. 37 sp.)
Scalesia X 4 (species complex)
Conostylis X 42 (?all)
Equisetum X 14
Pinus X 66
Hordeum X 10
Lauraceae
Gaps as a 5th State Gaps = missing data With duplicates removed
Haplo-types %
Haplotypes
Haplotypes
matK 201 69.55% 200 69.20% 186 64.36%
rpoB 129 44.64% 129 44.64% 122 42.21%
rpoC1 124 42.91% 124 42.91% 120 41.52%
rpoB + matK 234 80.97% 234 80.97% 214 74.05%
rpoB+rpoC1 184 63.67% 184 63.67% 175 60.55%
rpoC1+mak 235 81.31% 234 80.97% 214 74.05%
rpoC1+rpoB+matK 251 86.85% 251 86.85% 229 79.24%
Individuals 366 366 366 289
Species 289 289 289 289
Samples with unique ‘barcode’