standard land plant barcoding requires a multi loci approach? peter gasson sujeevan ratnasingham...

26
Standard land plant barcoding requires a multi loci approach? Peter Gasson Sujeevan Ratnasingham Robyn Cowan

Upload: joella-porter

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Standard land plant barcoding requires a multi loci approach?

Peter Gasson

Sujeevan R

atnasingham

Robyn Cowan

Mitochondrial DNA in land plants:

•undergoes rearrangements

•transfer of genes to nucleus

•incorporation of foreign genes

•substitution rates are VERY slow (with a few notable exceptions e.g. Plantago, Cho & al.)

COI Divergence in Eukaryotes

Eub

acte

riaC

ilia

taA

pico

mpl

exa

Din

ofla

gel

lata

Eug

leno

phyt

aK

ine

top

last

ida

Chl

oro

ara

chn

ioph

yta

An

imal

ia

Chl

oro

phyc

eae

Fu

ng

i

Xan

tho

phy

ceae

Pha

eop

hyce

ae

Eus

tigm

ato

phyc

eae

Bac

illar

ioop

hyce

aeC

ryp

top

hyta

Pry

mne

siop

hyta

Chl

oro

phyc

eae

Rho

doph

yta

La

nd

Pla

nts

20 % aa

Resolving Species Through DNA Barcoding

Partners

Instituto de Biologia UNAM,Mexico – Gerardo SalazarImperial College, UK - Timothy Barraclough

Natural History Museum, Denmark - Gitte Petersen Natural History Museum (London), UK - Mark CarineNew York Botanical Garden, USA - Kenneth Cameron

Royal Botanic Garden Edinburgh, UK - Peter Hollingsworth Royal Botanic Gardens, Kew, UK - Mark Chase

South African National Biodiversity Institute - Ferozah Conrad University of Cape Town, South Africa - Terry Hedderson

U. Estadual de Feira de Santana, Brazil - Cássio van den BergUniversidad de los Andes - Santiago Madriñán

U. of Wales Aberystwyth UK (previously University of Reading, UK) - Mike Wilkinson

Alfred P. Sloan FoundationGordon and Betty Moore Foundation

To develop a universal approach to barcoding of all landplants

• Phase 1: primer development (protein motifs); complete genome sequences; problems: ferns; 46 pairs of sister taxa from mosses, liverworts, hornworts, lycopods, ferns/fern allies, gymnosperms, angiosperms – percent PCR success & percent polymorphisms

• Phase 2: in depth trials of six markers identified in phase I on a range of well sampled taxa from across land plants

So what are the characteristics of a good barcode?

•High inter-specific, low intra-specific sequence divergence•Universal amplification/sequencing with standard primers•Technically simple to sequence•Short enough to sequence in one reaction•Easily alignable (few insertions/deletions)•Readily recoverable from museum or herbarium samples and other degraded samples

**Universal + Variable**

What sort of marker should we use?

•Mitochondrial DNA

•Plastid

•Ribosomal DNA (ITS)

•Low-copy nuclear DNA (protein coding)

•Length variable ?

•Single loci

•Multiple loci (one genomic compartment) ?

•Multiple loci (two genomic compartments) ?

Advantages of plastid DNA (hence its use in phylogenetics)

•Monomorphic (separation of different copies not required in hybrids)

•High copy number (can even be amplified from highly degraded DNA)

•Potentially highly diagnostic (in spite of its reputation to the contrary)

However, will not detect hybrids, introgression, paralogy

Coding or non-coding?

Non-coding regions:

sometimes more variable

microsatellites difficult to sequence through

numerous indels-impossible to align, length variable

cannot translate to check for pseudoproteins and to aid aligment

sometimes contain rearrangements and coding insertions

(character based identification)

trnH-psbA spacer region

Criterion for locus selection

1. Species level sequence divergence

2. Appropriate length (200-800bp)

3. Presence of conserved primer target sites

4. At least 200bp exon sequence

Our Strategy

1. Identify suitable loci on the basis of in silico screens using Nicotiana cp sequence

2. Design universal primers (sets of 4 primers/locus) using amino acid and nucleic acid sequence data

3. Perform initial screen for universality (1 primer pair)

4. Screen for sequence variation using diverse species pairs

5. Improve universality (e.g. use all primer combinations)

6. Use statistical modelling approaches to identify optimal primer sets

Standard PCR Recipe

• NH4 x1

• Mg2+ 1.5mM

• dNTPs 0.2mM

• FW test primer 1M

• RE test primer 1M

• Taq DNA polymerase 2 units

• BSA 0.1mg/ml

• Template 40ng

• Water to 20l

Results of First PCRGene

nd

hK

nd

hJ

rpoC

1

rpoB

YC

F2

accD

rpoC

2

nd

hA

YC

F9

YC

F5

matK

rpl2

2

Total success

99 88 88 84 75 73 71 70 67 57 45 23

% success

90 80 80 76 68 66 65 64 61 52 41 28

Number of Variable SitesGene

matK

(11)

YC

F5 (10)

accD (6)

rpoC2 (7)

rpoB (4)

rpoC1 (3)

YC

F9 (9)

ndhJ (2)

ndhA (8)

ndhK (1)

YC

F2 (5)

Variable sites

514 256 210 300 188 226 61 125 102 51 95

Length 814 414 394 700 475 578 163 366 328 185 423

% sites variable

63 62 53 43 40 39 37 34 31 28 22

Trial regions

Selected seven genes that represent the different levels of universality and variability. Blue= high, green = medium, yellow= low.

Gene

ndhA

YC

F9

rpoC2

accD

ndhK

YC

F2

rpoB

ndhJ

rpoC1

YC

F5

matK

Variability

Universality

Trial groups

Asterella Anastrophyllum-Barbilophozia Tortella Bryum Triquetrella Homalothecim Tortella Elaphoglossum Asplenium Equisetum Cupressus Pinus Araucaria Labordia Conostylis Dactylorhiza maculata/incarnata Mimetes Inga Hordeum Scalesia Crocus Laelia Cattleya Mormodes Deiregyne Lauraceae

Group Family Primary genera accD matK ndhJ rpoB rpoC1

Angio asterids Asteraceae Scalesia 1+3 2.1+5 1+3 1+3 1+3

Angio asterids Loganicaceae Labordia 2+4 X+5 1+4 1+3 2+4

Angio eudicots Proteaceae Mimetes 1+4 * * * 1+4

Angio magnoliids Lauraceae Lauraceae 2+4 X+5 2+4 2+3 2+4

Angio monocot Agavaceae Agave 1+4 2.1+3.2 1+4 1+4 1+4

Angio monocot Haemodoraceae Conostylis 2+4 X+5 1+3 2+3 2+4

Angio monocot Iridaceae Crocus 2+4 2.1a+5 1+3 2+3 1+3

Angio monocot Orchidaceae Aulosepalum 1+4 2.1+3.2 1+4 1+4 1+4

Angio monocot Orchidaceae Cattleya 2+4 2.1a+5 * 2+3 2+4

Angio monocot Orchidaceae Dactylorhiza 2+4 X+5 1+3 2+3 2+4

Angio monocot Orchidaceae Sophronitis 2+4 2.1a+5 1+3 1+3 2+4

Angio monocot Poaceae Hordeum Missing 2.1a+5 1+3 2+3 2+4

Angio rosids Fabaceae Inga 2+4 X+3.2 1+3 1+3 2+4

Fern Aspleniaceae Asplenium * * LP1+LP5 * *

Fern Dryopteridaceae Elaphoglossum LP1+LP4 * * * LP1+LP5

Fern ally Equisetaceae Equisetum 1+LP3 FE+RE LP1+LP4 LP1.1+LP4.3 LP1+LP5

Gymnosperm Araucariaceae Araucaria 2+4 FE+RE? 1+3 2+LP3 2+4

Gymnosperm Cupressaceae Cupressus 1+4 * 1+4 * 2+4

Gymnosperm Pinaceae Pinus 2+4 FE+RE Missing 2+LP3 2+4

Gymnosperm Zamiaceae Encephalartos 1+4 FE+RE * 2+3 1+4

Liverwort Aytoniaceae Asterella 2+4 * 1+3 * 2+4

Liverwort Lophoziaceae Anastrophyllum * * LP1+LP4 * 2+4

Moss Bryaceae Bryum * * LP1+LP4 LP1.1+LP5.2 2+4

Moss Pottiaceae Tortella * * * LP1.1+LP3.2 LP1+4

Moss Pottiaceae Triquetrella * * LP1+LP4 LP1.1+LP5.2 1+3

Moss Ptychomniaceae * 1+4 *LP1.1+LP5.3 LP1.1+LP5.3 1+4

Summary

rpoC1 accD ndhJ rpoB matK

5/25 5/20 5/19 8/20 6/16

Trial regions

Selected seven genes that represent the different levels of universality and variability. Blue= high, green = medium, yellow= low.

Gene

ndhA

YC

F9

rpoC2

accD

ndhK

YC

F2

rpoB

ndhJ

rpoC1

YC

F5

matK

Variability

Universality

Agavaceae X 22 sp.

Crocus X 9 sp.

Aulosepalum X 8 sp.(?all)

Cattleya X 30sp.(2 clades approx 43 sp.)

Dactylorhiza 15 sp. (species complex)

Sophrinitis 27 sp. (approx. 37 sp.)

Scalesia X 4 (species complex)

Conostylis X 42 (?all)

Equisetum X 14

Pinus X 66

Hordeum X 10

Lauraceae

    Gaps as a 5th State Gaps = missing data With duplicates removed

   

Haplo-types %

Haplotypes  

Haplotypes    

matK 201 69.55% 200 69.20% 186 64.36%

rpoB 129 44.64% 129 44.64% 122 42.21%

rpoC1 124 42.91% 124 42.91% 120 41.52%

rpoB + matK 234 80.97% 234 80.97% 214 74.05%

rpoB+rpoC1 184 63.67% 184 63.67% 175 60.55%

rpoC1+mak 235 81.31% 234 80.97% 214 74.05%

rpoC1+rpoB+matK 251 86.85% 251 86.85% 229 79.24%

Individuals 366   366   366   289

Species 289   289   289   289

Samples with unique ‘barcode’

Users of DNA Barcoding:

‘The Traffic Light approach’

Green - non-problematic taxa (current markers appropriate, silver standard)

Orange - need for gold standard(polyploidy, introgression, paralogy)

Red - barcoding needs investigation, species complex, etc