Molecular markers used plant phylogeny
Introduction
Well-resolved phylogenetic trees are essential for understanding evolutionary processes.…
Higher level - Protein sequences / coding regions Lower level - Non-coding regions: introns and intergenic regions
Nuclear regions – paralogs, concerted evolution etc – more informative – species level – subspecies level
Plastid regions – single copy maternally inherited – family/generic level
Mitochondrial regions – low sequence variation
Different genomes – different markers
Internal transcribed spacer (ITS)
White et al., 1990
Universal primers are available – White et al., 1990
High copy number - herbarium material too
Small size ~700 bp
High variation to study at species level
Can detect hybridization, ploidy etc.,
External transcribed region (ETS)
Calonje et al., 2009
ncpGS - introns
Nuclear encoded chloroplast expressed glutamine synthaseOxalidaceae – ncpGS / ITS – (Emshwiller and Doyle 1999 )Diverged much earlier from cytosolic GS – single copy in most taxa.Widely used eg., Gesneriaceae, Arecaceae, Apocynaceae…. Etc
Universal primer
Can be used along with other cpDNA markers to test congruence of data
Other introns:PEPC – fourth intronadh – introns between exons 2 and 9
AP3/DEF
MADS box transcription factor – B function gene
Impatiens – inter species levelTwo copies – paralogsPrimers have to be developed per group
Other MADs box genes used FLO/FLY – 2nd intronPI – 1st intron , 4th and 5th intron etc ..
Microsatellite (SSRs) flanking regionsSSRs have long been used in phylogeny and population studies.Flanking regions of SSRs
Annona - LMCH9/10Neutral evolutionMuch more variable - short length
Chatrou et al 2009
Transposable elements
Used to study populations along with AFLP Widely used in DNA fingerprinting than in phylogeny reconstructions at supraspecific level
Aid of Genomics
Next-generation sequencing – 454 sequencing – Illumina.
Genome assembly – provides ample information for marker development.
Rapid and cheaper sequencing of multiple regions across organisms (multiplexing)
0.5 x genome sequencing
complete chloroplast genome of 158,598
complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp)
partial mitochondrial genome sequence (130,764 bp)
88% hit with Cartharanthus roseus Unigene data
66% hit with orthologs of asterids
- Population level as well as species level microsatellite markers were developed
Microsatellite marker development using genomic information
> 150 bp contigs were scanned for 2,3,4,5,6 bp repeats - with Batchprimer3inverted repeats and several flanking – 100 and 450 bp
Genomic approaches
NGS methods - RAD sequences – Illumina UCE – Ultra conserved elements
Targeted amplicon sequencing – Roche 454 Multiplexing using MIDs or Barcodes – 3 cp and 2 nuclear
Assembly of reads > Quality check > Alignment – Phylogeny
Population level studies are possible – Haplotype networks
Presence of multiple alleles can be detected without cloning in polyploid species
Griffin et al., BMC Biology 2011, 9:19
Target Enrichment is the first step in genome wide sequencing
Hybridization based enrichment methods (Sequence capture) - can nowadays replace PCR as a method of enrichment
One or more genomes or transcriptomes are necessary for designing hybridization probes for sequence capture
90 base pair - synthesized probes would be used to capture sheared
DNA library prepared from target species using a Biotin-Streptavidin bead based binding.
The targeted regions will be harvested and sequenced as short reads by paired end sequencing using Illumina HiSeq NGS platform
Mitochondrial DNA
In higher plants: 4 times slower than chloroplast DNA (cpDNA) 100 times slower than in animal mtDNA
The genome size is varied in different groups rate of rearrangements is extraordinarily faster in plant mtDNA than in cpDNA and animal mtDNA
Used in phylogenies at lower groups of plants
Palmer and Hebron, 1998
Chloroplast DNA
Single copy number – no paralogy
Inherited from single parent
Lots of copies in cell
Widely used in plant phylogenetics
●Coding regions●Introns ●Intergenic spacers
rbcL
ribulose-1-5-biphosphate carboxylase/oxygenase (rbcL)Ritland and Clegg (1987)
Widely used for family level or higher classification
APG II system :18S rDNA, rbcL, and atpB
matK
maturase K – about 1.5 kb – one of the fast evolving genes
Embedded in between the trnK intron
429 and 1313 of the matK (~930) – universal primer
Potential candidate for plant DNA barcode along with other regions
trnTL-trnL-trnLF
The Tortoise and the Hare• Screening for newer non-coding cpDNA regions
– 34 noncoding regions – from different groups of APGII
• Primers based on 3 complete genomes – Poaceae, Fabaceae and Solanaceae. (Shaw et al. 2005, 2007)
Combined regions●trnT-trnL-trnL, ●ndhJ-trnF-trnL, ●TrnS-trnG-trnG, ●2 trnK intron, ● trnL-trnLtrnF.
DNA barcoding
• www.barcodinglife.org - CBOL• http://barcoding.si.edu/• http://www.dnabarcoding.ca• http:// www.kew.org/barcoding• http://www.ibolproject.org.
• matK, rpoC1, rpoB, accD, YCF5 and ndhJ : CBOL (2008)
• rbcL + matK : CBOL (promising candidates)
Roy et al ., 2011. PLOS one
psbA-trnH, matK, rbcL & ITS
Berberis, Ficus, Gossypium
Problems with plant barcodes
• ITS has multiple copies (paralogs) in several taxa
• psbA-trnH is rapidly evolving and has lot of indels
• psbA-trnH and ITS were not able to differentiate species of Scalesia (Asteraceae). (Seberg and Peterson ., 2010)