quantitative genetics: markers for conventional...
TRANSCRIPT
P. K. GUPTAMOLECULAR BIOLOGY LABORATORYDEPTT. OF AGRICULTURAL BOTANY
CCS UNIVERSITYMEERUT
Quantitative Genetics: Markers for Conventional Breeding
Quantitative Genetics
Era 3 (1990s and 2000 onwards)QTL (Quantitative Trait Loci) AnalysisLD & Association Mapping: complex traits
•
Pre-Mendelian
Work•
Francis Galton
& Karl Pearson’s Work
•
Era 1 (1910-1950)•
Multi-Factor Hypothesis: simple traits
•
Era 2 (1960-1980)•
Biometrical Genetics: complex traits
Molecular Markers•
1st
Generation Markers (RFLP, RAPD)
•
2nd
Generation Markers (SSR, AFLP)•
3rd
Generation Markers (SNPs)
•
New Generation Microarray-based Markers (SFPs, DArT) Gupta et al. 2008: Heredity
Statistical Tools•
Maximum Likelihood
•
EM Algorithm•
Bayesian Approach & MCMC
•
Population Structure and TASSEL for Association Mapping
QTL AnalysisMethods:
•
1. Linkage Analysis (single marker analysis and interval mapping)
•
2. Linkage Disequilibrium (LD) and Association AnalysisRequirements:1. Normal distribution of the trait 2. No segregation distortion of marker genotypes
Methods of QTL Mapping
•
Single Marker Methods•
Interval Mapping (SIM, CIM, MIM)
•
Bayesian Approach & Markov Chain Monte Carlo (MCMC) Methods
Mapping Populations for QTL Mapping
•
Backcross (BC1
)•
F2
•
Doubled haploids (DH lines)•
Recombinant inbred lines (RI lines)
•
Advanced backcross (BCt
)•
Advanced intercross (Ft
)
Segregation Distortion
•
Can be caused due to selection •
It can bias the estimation in recombination
•
It can reduce the power to identify QTL•
It can bias the estimation of QTL position and effects
Statistical Issues Involed in QTL Mapping
•
Hypotheses•
Distribution of test Statistics (t, F, 2, LR/LOD)
•
Single QTL•
Multiple QTL
•
Significance level (Churchill & Doerge,’94)
Single Marker Analysis (SMA)
•
Model: yj
= + β1
xi + εji = Individuals 1……to …n;
yj
= trait value for individual i
= population mean;
β1 = trait-marker regression xi
= QTL genotype•
t-test for means of two marker allele classes
•
Marker-trait association through regression of trait on marker genotypes
t -
Test for QTL Analysis
•
Genotype the population using markers•
Classify the population in two groups for each marker (on the basis of marker alleles)
•
Work-out trait-means for each group•
Apply t-test for significance for difference between means for each marker
•
Significant difference means marker-trait association
QTL Analysis of Protein Content
SMA (Continued)
•
Likelihood Ratio Test (equivalent t-test): L(,β1
,2/y,X)
=
N{yi
– (
+ β1
Xi
), 2}yi
– (
+ β1
Xi
) = εj (normally distributed)LR = {L(,β1
,2)/ L(, 0
,2)}LOD = Log10
LR
Limitations of SMA•
Means of QTL genotype are confounded
•
QTL positions can not be precisely determined (can not distinguish between QTL with minor effect and close linkage from QTL with major effect and loose linkage)
•
Statistical power low
Interval Mapping: Three Steps
•
Recording of Data on Mapping Population-
1. Phenotype (Trait) & Genotype (Markers)
•
Construction of a molecular map: MAPMAKER
•
Estimation of position and effects of QTL
Simple Interval Mapping (SIM)
SIM involves scanning of each bin (1-2cM) for the presence of QTL: two approaches
•
Maximum Likelihood Approach (Lander and Botstein, 1989)
•
Regression Approach (Haley & Knott, 1992; Martinez & Curnow, 1992)
Maximum Likelihood Interval Mapping (MLIM) Approach
•
EM (Expectation-Maximization) algorithm•
Maximize the probability of the presence or absence of QTL
•
Work out LR ratio/LOD score•
Find threshold LOD score
•
Prepare a LOD plot and detect QTL
Interval Mapping by Regression
•
Haley & Knott; Martinez & Curnow, 1992•
Phenotypes are regressed on a variable ‘l’, based on (r, 1-r) for each position in a map interval & for flanking markers genotypes
•
Y = m + l (m = mean,
= QTL effect); can be calculated and its significance tested
Interval Mapping by Regression
•
A Q B
Marker P(QQ)AABB 0.99 AAbb 0.75aaBB 0.25Aabb 0.01
.......b
P(QQ)
Y .0 1
Substitute Probabilities for Missing Genotypes
MLIM vs RIM
•
MLIM is computationally demanding•
RIM is simpler and less demanding
•
In general, MLIM and RIM give similar results, and RIM is sometimes preferred
•
Under certain conditions, MLIM may be more precise and powerful
Limitations of SIM•
Number of QTLs can not be resolved (QTLs may be linked)
•
Exact positions of QTLs can not be determined (QTL interactions)
•
Statistical power low (limited information in the model)
Composite Interval Mapping (CIM)
•
A combination of SIM & multiple linear regression (Zeng, 1993, 1994): yj
= μ
+βi
xij +
Σ βk
xkj + εj βi = effect of QTL between markers i & i + 1; xij, xkj = dummy variables for markers i and k in individual j; βk = partial regression of trait on marker k
•
CIM can be conducted using MLIM or RIM•
QTL Cartographer can be used (all softwares do not have facility for CIM)
Advantages of CIM
•
The effects of other QTLs are removed•
QTL locations and effects unbiased
•
Resolution much higher than SMA/SIM•
More informative and efficient, since there are more variables in the model
Limitations of CIM
•
The test statistics in marker rich region and marker poor regions are not comparable
•
Only QTL main effects can be determined•
Joint contribution of multiple linked QTL can not be worked out (epistatic interactions can not be worked out)
•
Use of tightly linked markers as cofactors can reduce the statistical power
Multiple Interval Mapping (MIM)
•
Model: yj
= μ
+ Σ βir
xij + Σ
(βir
xis) + εj
Σ βir
xij
= main effects; Σ
(βir
xis
) = epistasis•
May start with CIM results to save time
•
Allows estimations of epistasis•
Allows estimation of Q x E interactions
QTLs with No Main Effects, But Only Epistatic Interactions
•
Many QTLs with no main effects, and having only epistatic interactions may be involved (Xing et al., 2002)
•
At the molecular level also protein ×
protein interactions may be involved in regulations of gene expression
Softwares for QTL Mapping
•
QTL Cartographer•
QTLmapper 1.0
•
MapMaker/QTL•
PLABQTL
•
QGENE•
Bmapqtl (Bayesian Interval Mapping)
Gene/QTL Tagging in Wheat
•
Two Markers for GPC•
Two Markers for PHST
•
One Marker for GW•
Validation of QTLs for GPC Using NILs
•
QTL Interval Mapping for GPC Using RILs•
QTL Interval Mapping for 14 Traits Using ITMImap and ITMIpop
QTL Analysis of Protein Content
Composite Interval Mapping (CIM) for days to maturity involving chromosome 2D
What is Association Mapping?
•
Population-study for marker-trait associations-
Also called Association analysis/LD mapping
-
Relies on historic sources of disequilibrium•
Two main approaches in humans: -
Case-control studies
-
Transmission disequilibrium tests
Association Mapping: Why?
–
More opportunities for recombination giving rare recombinants
–
Map distances more precise with relatively small sample sizes
–
If LD decays too fast in a region, many markers needed to scan target region
Contd
Association Mapping: Why?
•
Population survey: multiple alleles are represented, & inferences are broader than for a single mapping population
-
QTLs identified are useful in diverse genetic backgrounds
Association Mapping: Where?
•
Initially used for outbreds:
controlled crosses are difficult or impossible
•
In humans, association used to identify & clone several genes for diseases like:–
Cystic fibrosis
–
Diastrophic dysplasia–
One of the major Alzheimer’s factors
Association Mapping: How?
•
Two Main Approaches-
Candidate Gene Approach
-
Genome-Wide Association Mapping
Genome-Wide vs Candidate-Gene Association Mapping
Steps for Association Studies:
(1) Selection of population samples, (2) Study of population structure (3) Phenotyping
for traits of interest
(4) Genotyping candidate genes/genome(5) Testing marker-trait association
Types of Populations
•
Classic Association Populations
(diversity estimated and population of diverse unrelated individuals developed)
•
Family Based Association PopulationsJoint linkage & association analysis: TDT, family based association tests (FBATs); QTDT, Q Inbred Pedigree DT (QIPDT)
•
Special Association Populations
(e.g. NAM)
Population Structure (PS)
•
PS creates genome-wide LD between unlinked loci; effect of PS on Association is controlled by-Genomic Control (GC)-Structured Association (SA)-Unified mixed model approach (Q+K)(K = familial relatedness between individuals; Q = relatedness due to population structure are considered as independent variables)
•
The genetic makeup of the study population defines association tests to be used.
TASSEL for Association Studies
•
TASSEL = Trait Analysis by aSSociation. Evolution, Linkage (Yu et al. 2006)-LD calculated as D’
and r2 and P-values
-Diversity analysis tools also available-Sequence align, SNP/indel
extraction
-Uses SNPs/SSRs/sequences/phenotypes
Nested Association Mapping (NAM)
•
Combines QTL & Association Mapping-Select diverse founders & develop ~5000 RILs
from 25 families (~200 RILs/family)
-Densely genotyping the founders
Proposed by Ed Buckler-Yu et al. 2008, Genetics 178: 539
Genome reshuffling between 25 diverse founders
A pair of polymorphic CPS markers leading to fine mapping of NAM
CPS Markers = Common Parent Specific (B73) Markers
CAP markers
TDT & Admixture Mapping ‘Mapping by Admixture LD’
(MALD)
•
Transmission Disequilibrium Test (TDT) controls false positives by using two parents and single progeny
•
MALD detects linkage by testing ancestry of marker-trait association (when parent populations differ in allele frequencies)
Multi-trait Association Mapping (MTAP)
•
Stich
et al. (2008), TAG 117 (6): 947-954(Melchinger’
group on sugar beet)
•
Generally single traits are studied at a time in association mapping; same QTL may affect two or more correlated traits
•
MTAP distinguishes between pleiotropy
& close linkage
Association mapping in Multiple Segregating Populations (AMMSP)
•
Stich
et al. (TAG: 117, # 7, Nov. 2008)(Melchinger’s
group on sugar beet)
•
NAM needs populations by crossing many inbreds with a common genotype
•
AMMSP uses multiple related crosses from routine plant breeding programs
•
AMMSP Successfully used in sugar beet
Multiparent
Advanced Generation Inter-Cross (MAGIC) for LD
•
First prposed
and applied in mice•
Population captures majority of variation
•
Coarse mapping possible with low marker density in early generation
•
Fine mapping in advanced generations using higher marker density
•
Will exploit advances in genomics
Ideal Population for Association•
Formed from a few founding ancestors
•
Maintained in relative isolation•
Random mating has occurred over a number of generations, but not too many…–
LD decayed over fairly small map distances
–
Adequate LD still exists between marker & linked QTL
Linkage disequilibrium (LD) decay plot of shrunken 1 (sh1) in maize. Data from Remington et al., 2001
Association Analysis in Maize (Dwarf8 and sugary1)
•
Dwarf8 gene in maize: Thornsberry
et al., 2001–
Associated with early flowering & low pl ht
–
LD decayed rapidly –
no association between flowering time and tb1, located 1 cM
from d8
•
Sugary1 (su1) in sweet corn: Whitt et al., 2002–
Little recombination observed within the locus
More Genes for LD in Maize
•
Yellow1 (Y1)•
Teosinte Branched 1 (Tb1)
•
Lycopene
epsilon cyclase
(LcyE)-
Recommended for biofortification
of
maize for vitamin A using MAS (Science Jan 2008)
Extent of LD in Other Plants
•
Arabidopsis –
LD extends much farther (Ca maize)–
Self-pollinating
–
LD typically decays within ~250 kb (1 cM)•
Sugarcane –
LD extends ~10 cM
–
Due to bottleneck in breeding history (interspecific cross)
–
May also be an artifact due to polyploidy
Association Mapping of Kernel Size and Milling Quality in Wheat
•
Mark E Sorrells
(Cornell)•
Association Studies-
2005 Crop Sci
45: 1685-1695
-
2006 Genetics 172: 1165-1177•
QTL Interval Mapping-
2007 Field Crop Res
101: 172-179
Association Studies in Other Plants
•
Oryza sativa (Olsen and Purugganan
2002).•
Pinus taeda (Ersoz
2006)
•
Eucalyptus spp. (Thumma
et al. 2005)•
Lolium perenne (Sk
øt
2005)
•
Beta vulgaris (Hansen et al. 2001)(For details, see review by Ersoz
et al., 2007)
Germplasm
New Populations
New Synthetics, Lines, Varieties
Elite Synthetics, Lines, Varieties
Hybridization
Selection(Intermating)
Evaluation Trials
Genotypic & Phenotypic data
Parental Selection
Marker Assisted Selection
Novel & ValidatedQTL/MarkerAssociations
Integration of Association Analysis in a Breeding Program
Elite germplasmfeeds back intohybridization
nursery