genecfingerprinnggcp21.org/tanzania/moragferguson3.pdfcurrentsnp&genotyping&for&gene$c&...
TRANSCRIPT
Gene$c Fingerprin$ng
Introduc$on Unique iden$fica$on
Purpose of Fingerprin$ng in Germplasm Cura$on
• Characterisa$on of ‘type-‐specimen’ • Confirma$on of iden$ty • To iden$fy variants within a ‘variety’ • Iden$fica$on of duplicates • Study diversity
Use the gene$c code to fingerprint
How to read the gene$c code?
• Gregor Mendel published in 1866 • Enzymes (isozymes) • DNA: – Non-‐PCR based (RFLPs, RFLP-‐VNTP) – PCR based (1983) – arbitrary primed • (RAPD, AFLP)
– PCR based – site-‐targeted PCR • (SSR, STS)
– Sequencing (SNP) AFLP
RFLP SSR
Single Nucleo$de Polymorphism
Detec$ng SNPs
• ‘Chip’ based – use of specific primers – Illumina GoldenGate (96, 384 and 1536 SNPs) – Affymetrix chips (over 100,000 SNPs) – KBioSciences (flexible)
• Sequencing
All DNA sequencing is based on the principles of DNA synthesis
• A DNA template • A primer to ini$ate • An enzyme to add new nucleo$des • A way to record which nucleo$de is added
Advances in Sequencing Technology • Sanger sequencing
– the reac$on occurs in the tube – The gel/machine simply reads out the results – Limited by physical capabili$es of electrophoresis – Requires physical space to separate by size – This limits capacity & speed of a sequencing machine
• In Next Genera$on Sequencing (NGS): – The reac$on occurs in the machine – Read the DNA sequence as it is generated. – Growing length of DNA strand is now represented in $me, not in physical
separa$on – Allows much higher capacity – Plus very high resolu$on imaging
Using sequencing for SNP genotyping
Op$ons: • Whole genome re-‐sequencing (too much sequencing and bioinforma$cs)
• Reduced Representa$on Genomic sequencing – Genotyping-‐by-‐sequencing (GBS) – RADSeq
Genotyping-‐by-‐sequencing (GBS) A reduced-‐representa$on approach. • Restric$on enzyme used to generate many fragments of genomic DNA which are then sequenced. Only a subset of SNPs are sampled from each individual—need fewer reads per individual, allowing for mul$plexing. • Restric$on digest ensures that the same sites are sampled from each individual.
Reference Genome Sequence
Genotype 1
AAG
GC C C
Genotype 2 G
GGG
G
G G
Rela$ve Genotyping Costs
Technology Number of SNPs Cost/genotype ($)
GoldenGate 1536 70
KBioSciences 500 64
KBioSciences 300 40
GBS (Cornell) 4500 upwards (now 15,000 using ApeK1)
53 (48 plex) 38 (96 plex)
Includes $8 for bioinforma$cs service. IGD Website $20 for 96 plex and $10 for 386 plex
SNPs in Cassava
§ Es$mated one SNP every 121bp § Cassava genome es$mated to be ~ 770Mb § Approx. 6.3 million SNPs in cassava § Ideal for fingerprin$ng § How do we visualise? § Should we sub-‐sample and if so how?
SNPs in Cassava cont.
Current SNP genotyping for diversity assessment in cassava
• Primer-‐specific SNPs – GoldenGate (960 genotypes) – KBioSciences (96 genotypes)
• GBS – 700 genotypes from breeding program and genebank (CRP)
– 650 from gene$c gain (Next/Gen) • RADSeq – 577 (CIAT, CRP)
Current SNP genotyping for gene$c linkage mapping in cassava
LG2
s06715:1424080.0s03823:20746 s03823:376850.7s06715:1982981.4s04175:264551 s04175:2750222.9s04175:350626 s04175:3439983.6s04175:474943 s04175:634572s04175:713299 s04175:6737284.3s04175:4504815.0s06711:44855712.4s04175:429876 s04175:42991615.5s03823:153891 s04175:331167s03823:6098520.7s04175:626180 s06711:39880922.1s03823:27160723.5s03823:2071224.3s08582:24618 s08582:1514131.8s08582:39496 s08582:7202332.5s06158:99353 s08287:73642s08287:73701 s06158:10513736.9s06825:193194 s06825:15350037.6s07005:11282549.3s06825:416263 s06825:39943953.2s06825:153432 s06825:19280253.9s06158:18683054.9s05782:4148659.0s03131:8736359.7s00093:513360.4s07933:4033264.8s07933:137652 s09133:941s09133:96465.5s06485:64517 s07933:1577169.9s11174:6054572.8s05214:1081274 s05214:981002s05214:65653274.3s05214:371707 s05214:71902978.7s05214:659702 s05214:71929882.3s05214:28384783.0s05214:10285384.5s06906:36889885.2s00821:27315291.1s00631:4753492.6s06906:368466108.1s06906:39485111.0s04745:91381122.1s02618:244177125.7s00984:12233135.8s10806:61159 s10806:61138147.7
2_P1
s06715:1424080.0s03823:20746 s03823:376850.7s06715:1982981.4s04175:264551 s04175:2750222.9s04175:350626 s04175:3439983.2s04175:474943 s04175:634572s04175:713299 s04175:6737283.6s04175:4504813.9s06711:4485577.8s04175:429876 s04175:4299169.4s03823:153891 s04175:331167s03823:6098512.1s06711:47904713.9s06711:29970414.4s06711:199846 s03866:487314.8s04175:626180 s06711:39880915.7s03823:27160718.3s03823:2071219.7s06711:479063 s04175:169298s06711:47932019.8s04175:78403620.9s03823:2114823.1s06825:61328430.5s08582:24618 s08582:1514133.3s08582:39496 s08582:72023s06825:57062734.5s06158:99353 s08287:73642s08287:73701 s06158:10513741.8s06825:193194 s06825:15350042.2s07005:11282548.4s06825:416263 s06825:39943950.4s06825:153432 s06825:19280250.8s06158:18683051.3s06158:17094153.7s06485:74176 s05782:4148655.7s03131:8736356.3s00093:513357.0s07933:9983 s07933:4033260.8s07933:13765261.9s09133:941 s09133:96462.6s06485:64517 s07933:1577166.7s08877:75856 s08877:118507s11174:6088467.4s11174:6054569.4s05214:1081274 s05214:981002s05214:65653270.7s05214:386819 s05214:37170774.8s05214:71902975.5s05214:659702 s05214:71929878.3s05214:28384778.9s05214:10285380.0s06906:36889880.6s05214:71900183.8s00821:27315285.2s00631:4753486.4s05214:38678587.9s06906:36846698.5s06906:39732 s06906:39459100.7s06906:39485100.8s02811:62372111.8s04745:91381115.4s02618:244177117.2s01709:120976119.4s00984:12233127.3s10806:61159 s10806:61138139.2
2
s03823:207460.0s06715:1982980.7s04175:264551 s04175:275022s04175:350626 s04175:343998s04175:634572 s04175:713299s04175:673728
2.1
s04175:3311672.8s06711:4790475.7s06711:2997046.5s06711:199846 s03866:48737.2s06711:3988098.6s06711:479063 s04175:169298s06711:47932014.6s04175:78403616.1s03823:2114819.2s06825:61328430.0s08582:39496 s06825:57062735.9s06158:99353 s06158:10513746.0s06158:18683047.0s06158:17094149.6s06485:74176 s05782:4148651.7s07933:9983 s07933:4033256.1s07933:13765257.6s09133:941 s09133:96459.0s08877:75856 s08877:118507s11174:6088463.4s05214:386819 s05214:37170770.2s05214:71902971.6s05214:71900177.6s05214:38678580.5s06906:39732 s06906:3945989.7s06906:3948589.8s02811:62372103.5s04745:91381 s02618:244177108.0s01709:120976110.2
2_P2
§ Linkage map – order of markers on chromosomes
§ Currently 3500 SNP markers § It is possible to select a sub-‐set of
markers evenly distributed across genome for primer-‐specific SNP genotyping
Factors to consider • Long-‐term availability and applicability of technology
• Turn-‐around $me • Cost • How much data do we need to: – To fingerprint a type-‐specimen – determine iden$ty – To iden$fy variants within a ‘variety’ – Study diversity
Rela$ve Genotyping Costs
Technology Number of SNPs Cost/genotype ($)
GoldenGate 1536 70
KBioSciences 500 64
KBioSciences 300 40
GBS (Cornell) 4500 upwards (now 15,000 using ApeK1)
53 (48 plex) 38 (96 plex)
Includes $8 for bioinforma$cs service. IGD Website $20 for 96 plex and $10 for 386 plex
My conclusion • Go with GBS • We should have a standard set of approx 300 SNPs evenly spaced and in different regions (coding, non-‐coding) of genome for KBioSciences genotyping. These must also be captured by GBS.
• You will need DNA extrac$on and quan$fica$on facili$es