marker assisted selection in tomato pathway …solcap.msu.edu/s/03tbrtworkshopfinal.pdfmarker...
TRANSCRIPT
Marker Assisted Selection in Tomato
Pathway approach for candidate gene identification and introduction to metabolic pathway databases.
Identification of polymorphisms in data-based sequences
MAS – forward selection, background selection, combining traits, relative efficiency of selection
Why (population) size matters
Dist MarkercM Name
CT233
17.1
TG672.1LEOH36
20.0
TG1252.0CT62
24.7
CT1497.2
LEOH17*
12.8
TG2739.3
TG599.9
CT191
10.7TG465
5.8TG2601.1LEOH7
15.1
TG25510.1
TG580
Chr 1
Dist MarkercM Name
TG608
13.9
CT205
10.7TG1653.8 LEOH23
10.9TG14
18.4
CT2446.1
TG4695.2
TG645
10.6TG537
7.0TG1673.1TG151
10.2TG154
Chr 2
Dist MarkercM Name
TG15
12.4
TG483
23.6
CT157
18.5
LEOH377.7
CT1786.5
CT194
16.5
CT501.6TG500
13.6
TG1630.0LEOH10
Chr 4
Dist MarkercM Name
CT1017.9
TG4413.6CT167
18.5
CT933.0LEOH163.0TG96
12.5
TG100A
19.5
CT118
28.3
TG185
Chr 5
IL1-
1
IL4-
3IL
4-4
IL1-
2IL
1-3
Chr 3
IL2-
4
LEOH15*
IL4-
1
LEOH17*
IL5-
2
LEOH17*
Dist MarkercM Name
TG1145.2
TG130
18.5
CT141
13.4
TG520
23.5
CT829.1
CD5110.1
TG1295.7
TG2468.2
CT85
18.2
TG214
IL3-
2
LEOH17*
IL3-
1
LEOH15*
IL3-
3
LEOH17*
LEOH17*
LEOH36
LEOH10
LEOH37
Example: QTL for color uniformity in elite crosses
QTL Trait Origin2 L, YSD S. lyc.4 YSD S. lyc.6 L, Hue ogc
7 L, Hue S. hab.11 L, Hue S. lyc.Audrey Darrigues, Eileen Kabelka
Carotenoid Biosynthesis: Candidate pathway for genes that affect color and color uniformity.
Disclaimer: this is not the only candidate pathway…
http://metacyc.org/
http://www.plantcyc.org/
http://sgn.cornell.edu/tools/solcyc/
http://www.arabidopsis.org/biocyc/index.jsp
http://www.arabidopsis.org/help/tutorials/aracyc_intro.jsp
External Plant Metabolic databasesCapCyc (Pepper) (C. anuum) CoffeaCyc (Coffee) (C. canephora) SolCyc (Tomato) (S. lycopersicum) NicotianaCyc (Tobacco) (N. tabacum) PetuniaCyc (Petunia) (P. hybrida) PotatoCyc (Potato) (S. tuberosum) SolaCyc (Eggplant) (S. melongena)
Databases that link pathways to genes
Query CCACCACCATCCTCACTTTAACCCACAAATCCCACTTTCTTTGGCCTAATTAACAATTTT|||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||
Sbjct CCACCACCATCCTCACTTTAACCCACAAATCCCATTTTCTTTGGCCTAATTAACAATTTT
Zeaxanthin epoxidase
Probable location on Chromosome 2
Alignment of Z83835 and EF581828 reveals 5 SNPs over ~2000 bp
Comment on the databases:
Information is not always complete/up to date.
Display is not always optimal, and several steps may be needed to go from pathway > gene > potential marker.
Sequence data has error associated with it. eSNPs are not the same as validated markers.
There is a wealth of information organized and available.
We will be asking for feed-back RE how best to improve the SGN database and access via the Breeders Portal
The previous example detailed how we might identify sequence based markers for trait selection.
Query CCACCACCATCCTCACTTTAACCCACAAATCCCACTTTCTTTGGCCTAATTAACAATTTT|||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||
Sbjct CCACCACCATCCTCACTTTAACCCACAAATCCCATTTTCTTTGGCCTAATTAACAATTTT
Improving efficiency of selection in terms of 1) relative efficiency of selection, 2) time, 3) gain under selection and 4) cost will benefit from markers for both forward and background selection.
Remainder of Presentation will focus onWhere to apply markers in a programForward and background selectionMarker resourcesAlternative population structures and size
Relative efficiency of selection:r(gen) x {Hi/Hd}
Line performance over locations > MAS > Single plant
Comparison of direct selection with indirect selection (MAS).
F1 50:50
BC1 75:25
BC2 87.5:12.5
BC3 93.75:6.25
BC4 96.875:3.125
Expected proportion of Recurrent Parent (RP) genome in BC progeny
Accelerating Backcross Selection
Select for target allele
Select for RP genome at unlinked markers
Select for target allele
Select for RP recombinants at flanking markers
Select for RP genome at unlinked markers
Select for target allele
Select for RP recombinants at flanking markers
Select for RP genome on carrier chromosome
Select for RP genome at unlinked markers
Four-stage selection
Two-stage selection
Three-stage selection
References:
Frisch, M., M. Bohn, and A.E. Melchinger. 1999. Comparison of Selection Strategies for Marker-Assisted Backcrossing of a Gene. Crop Science 39: 1295-1301.
Progeny needed for Background Selection During MAS
20 40 60 80 100 125 150 200Two-StageBC1 76.7 78.7 79.7 80.3 80.7 81.3 81.7 82.2BC2 90.3 91.9 92.8 93.3 93.6 93.9 94.0 94.6BC3 95.8 96.2 97.1 97.3 97.4 97.5 97.6 97.8Three-StageBC1 71.2 72.7 73.4 73.6 73.3 73.2 72.8 72.2BC2 86.1 87.2 88.5 89.3 90.2 90.7 91.3 91.8BC3 94.4 95.7 96.5 96.9 97.2 97.3 97.5 97.6
Q10 of RP genome in percentPopulation Size
Q10 indicates a 90% probability of success
From Frisch et al., 1999.
Two-Stage Selection 60 80 100 125BC1 2880 3840 4800 6000BC2 900 1164 1416 1716BC3 228 264 300 348
Total Marker points 4008 5268 6516 8064Cost 0.15 601.2 790.2 977.4 1209.6
0.20 801.6 1053.6 1303.2 1612.80.25 1002.0 1317.0 1629.0 2016.0
Three-Stage SelectionBC1 2880 3840 4800 6000BC2 492 708 960 1308BC3 250 444 504 576
Total Marker point 3622 4992 6264 7884Cost 0.15 543.3 748.8 939.6 1182.6
0.20 724.4 998.4 1252.8 1576.80.25 905.5 1248.0 1566.0 1971.0
Population Size
Marker Data Points required (Modified from Frisch et al., 1999; based on assumption of 12 chromosomes; initial selection with 4 markers/chromosome)
For effective background selection we need:
Markers for our target locus (C > T SNP for Zep)
Markers on the target chromosome (Chrom. 2)
Markers unlinked to the target chromosome
Missing data in SGN
Limited ability to generate tables, PCR conditions sometimes incomplete, Enzyme sometimes missing, SNP not described.
Missing data in Tomatomap.net
SNP and sequence context requires BMC genomics supplemental table , ASPE primers, GoldenGate primers.
2007. BMC Genomics 8:465www.biomedcentral.com/content/pdf/1471-2164-8-465.pdf
Where can we expect to be?
n = 1 n = 2 n = 3 n = 4 n = 5-10 n > 10Total 806 596 106 34 22 38 10
n = 1 n = 2 n = 3 n = 4 n = 5-10 n > 10Total 127 not tested 64 22 11 23 7
Proportion 0.16 0.60 0.65 0.50 0.61 0.70
TA496 ESTs with SNPs VS H1706 BAC sequences
Where EST Coverage = Allele Coverage
Data based on estimated ~42% of sequence, therefore expect as many as 300 markers for a cross like E6203 x H1706
analysis by Buell et al., unpublished
QTL’s mapped in a bi-parental cross may not be appropriate for MAS in all populations…
Marker allele and trait may not be linked in all populations.
Genetic background effects may be population specific.
Original association may be spurious.
QTL detection is dependent on magnitude of the difference between alleles and the variance within marker classes.
What about mapping and MAS in unstructured populations?
A brief introduction to “Association Mapping” follows.
Y = μ REPy + Qw + Markerα + Zv + Error
“Association Mapping” statistical model – designed to account for population structure (Q), correct for genetic background effects (Z), and identify marker-trait linkage (Marker)
y = -0.054ln(x) + 0.2583
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 20 40 60 80 100 120 140
LD
mea
sure
(R2 )
Distance between loci (cM)
Processing
y = -0.037ln(x) + 0.1713
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 20 40 60 80 100 120 140
LD
mea
sure
(R2 )
Distance between loci (cM)
Fresh market
1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
12
1 2 3 4
1) Fresh Market (FM) ; 2) Landrace; 3) Heirloom; 4) Processing
K=4
1,6,7) Processing; 2) Landrace: 3,5) FM; 4) FM & Processing; 8) Heirloom
K=8
1 2 3 4 5 6 7 8
Output from Pritchard’s “STRUCTURE”
Tomato populations will have sub-structure
Association mapping
Incorporates population structure and coefficient of relatedness
The number of markers needed depends on the rate of LD decay (reflects recombination history)
Highly specific to “inference population”
wild species vs breeding program
Sensitive to marker coverage
LD decay and number of alleles (Nor, gf, and others all have multiple alleles within populations used by breeders)
Will not be able to “map” traits where trait variation overlaps with population structure.
Even without sequence or marker data, there are lessons for practical breeding:
Use pedigree data, knowledge of population structure, and objective data to increase precision of estimates of breeding value.
Take home messages:
Marker resources exist for forward and background selection in elite x elite crosses in tomato.
Marker resources are currently not sufficient for QTL discovery in bi-parental or AM populations; they will soon be.
The best time to use genetic markers : early generation selection
Restructuring of breeding program to integrate markers may include:
1) Increasing genotypic replication (population size) at the expense of replication (consider augmented designs).
2) Collecting objective data.
Further discussion of AM approach in session VI “Unstructured mapping of bacterial spot resistance”