qtl mapping
Post on 22-Nov-2014
1.157 Views
Preview:
TRANSCRIPT
QTL MappingQTL Mapping
Violeta I. Bartolome
Senior Associate Scientist-Biometrics
Crop Research Informatics Laboratory
International Rice Research Institute
Quantitative TraitsQuantitative Traits
• Vary continuously (e.g.
yield, quality, stress
tolerance)
• Usually governed by a
number of genes
• Loci involved in the
inheritance of quantitative
traits are called QTL
(quantitative trait loci)
QTL MappingQTL Mapping
Objective is to identify QTLs that affect
the quantitative trait of interest.
Mapping PopulationsMapping Populations
Data Needed for QTL MappingData Needed for QTL Mapping
• Assign a trait value for each mapping
population member.
• Allele score for the set of marker loci
distributed throughout the genome.
Methods to Detect QTLSMethods to Detect QTLS
• Single marker analysis
• Interval mapping
• Composite interval mapping
• Multiple QTL mapping
Single Marker Analysis (SMA)Single Marker Analysis (SMA)Model for SMAModel for SMA
eMGy i ++++++++µµµµ====
where y = the phenotype
MG = marker genotype
Single Marker AnalysisSingle Marker Analysis
• A significant difference between phenotypic means of the groups indicates that the marker locus being used to partition the mapping population is linked to a QTL controlling the trait.
• The QTL and marker is usually inherited together and the mean of the group with the tightly-linked marker will be significantly different to the mean of the group without the marker.
Single Marker AnalysisSingle Marker Analysis
Single Marker AnalysisSingle Marker Analysis
• Advantages
o Simple
o Easily incorporates
covariances
o Does not require a
complete genetic
map
• Disadvantages
o Must exclude individuals with missing genotype data
o Less precise about the location of the QTL.
o The farther away a QTL from a marker the less likely it is to be detected thus the QTL effect may be underestimated.
o Only considers one QTL at a time.
R/R/qtlqtl
Data Entry, Data Quality Check,
and Single Marker Analysis
Data analyzable by R/Data analyzable by R/qtlqtl
• F2
• Backcross
• RILs
– class(mydata) [1] <- “riself” # if by selfing
– Class(mydata) [1] <- “risib” # if by sibling mating
Data not analyzable by R/Data not analyzable by R/qtlqtl
• Outcross data
• Half-sib families
• Advanced intercross lines
Input filesInput files
• Text file (comma delimited)
• Mapmaker format
• QTL cartographer format
Sample DataSample Data• cvs format
back
Sample DataSample Data• Map maker format – genotype data
Sample DataSample Data• Map maker format – phenotype data
back
Sample DataSample Data• QTL Cartographer format – Rcross data
Sample DataSample Data• QTL Cartographer format – Rmap data
Reading cross dataReading cross data
• csv file
read.cross(“csv”, file=“csvfile.csv”, genotypes=c("A","H","B","D","C" ))
• Map maker file
read.cross(“csvs”, genfile=“mapmaker_gen.csv”,phefile=“mapmaker_phe.csv”)
• QTL cartographerread.cross(“qtlcart”,
file=“qtlcart.cro”,
mapfile=“qtlcart.map”)
Reading Reading csvcsv datadata
Data Quality CheckData Quality Check
Drop markers deviating from the hypothesized ratio
using the following statement
plot.missingplot.missing()()
• Plot a grid showing
which genotypes
are missing
Note: Genotypes with
missing data are denoted by
black pixels.
plot.mapplot.map()()
• Plot genetic map of
marker locations for
all chromosomes
plot.phenoplot.pheno()()
• Plots a histogram or
barplot of the data
for a phenotype
from an
experimental cross
Note: pheno.col indicates the column
number of the data to be plotted.
plot()plot()
• Plots all graphs
together
est.rfest.rf()()
• Estimate the sex-averaged recombination
fraction between all pairs of genetic
markers
• For a backcross, one can simply count
recombination events. For an intercross or
4-way cross, recombination fractions must
be estimated.
plot.rfplot.rf()()
• Plot a grid showing the recombination fractions for all pairs of markers, and/or the LOD scores for tests of linkage between pairs of markers
• If both are plotted, the recombination fractions are in the upper left triangle while the LOD scores are in the lower right triangle. Red corresponds to a large LOD or a small recombination fraction, while blue is the reverse. Missing values appear in light gray
Plot both Plot both rfrf and and lodlod
Plot Plot rfrf and and lodlod for for ChrChr 1 only1 only Plot Plot lodlod only for only for ChrChr 2 and 32 and 3
scanonescanone()()
• Genome scan with a single QTL
model, with possible allowance for
covariates, using any of several
possible models for the phenotype
and any of several possible
numerical methods
scanonescanone()()
scanone(cross, chr, pheno.col=1,
model=c("normal","binary","2part","np"),
method=c("em","imp","hk","ehk","mr","mr-
imp","mr-argmax"), addcovar=NULL, n.perm,)
cross – object to be analyzed
chr - optional vector indicating the chromosomes for
which LOD scores should be calculated
pheno.col – column number of the phenotype data
addcovar - additive covariates, allowed only for the normal
and binary models
n.perm – the number of permutations forward
model=model=
• normal – the standard QTL model for QTL mapping. The residual phenotypic variation is assumed to follow a normal distribution
• binary – for binary phenotype, which must have values 0 and 1. Available for em and mrmethods only
• 2part – when there is a spike in the phenotype distribution
• np( non-parametric) – an extension of the Kruskal-Wallis test is used
method=method=• mr – single marker regression
o mr – deletes individuals with missing genotype
o mr-imp – fills in missing data using single imputation
o mr-argmax – fills in missing data suing the Vitervi algorithm
• em – maximum likelihood using the Expectation-maximization (EM) algorithm
• hk – Haley-Knott regression
• imp – multiple imputation (Sen and Churchill, 2001). Uses Monte Carlo algorithm instead of EM.
• ehk – extended Haley-Knott method (Feenstra et al., 2006). An improvement of the hk especially when epistasis exists between QTLs
Single marker ANOVASingle marker ANOVA
• Threshold=3
• Using permutation test
Estimating heritabilityEstimating heritability
for each markerfor each marker Interval Mapping (IM)Interval Mapping (IM)
• Used for estimating
the position of a QTL
within two markers
• Statistically more
powerful than single
marker analysis
Methods used in IMMethods used in IM
• Maximum Likelihood (standard interval mapping)
• Haley-Knott Regression
• Extended Haley-Knott Regression
Note:
• All methods estimate three parameters: mean,
genetic effects and residual variance.
• All methods compute the conditional
probabilities for each QTL genotype at a position
between markers.
Probabilities of a putative QTL for Probabilities of a putative QTL for
a backcross a backcross
Prob(Q|M1M2) 12
21
1
)1)(1(
r
rr
−−−−−−−−−−−−
Prob(Q|M1m2) 12
21)1(
r
rr−−−−
Prob(Q|m1M2) 12
21 )1(
r
rr −−−−
Prob(Q|m1m2) 12
21
1 r
rr
−−−−
LOD ScoresLOD Scores
• Logarithmic of the odds – used to identify
the most likely position for a QTL in
relation to the linkage map
• Test of Significance
o LOD > 3 is the significance threshold – 1 in 1,000
the loci are not linked
o Permutation test
forward
OddsOdds
p
p
failureof.prob
successof.probOdds
−==1
Odds = 1 � equal chance of success and failure
Odds < 1 � lower chance of success
Odds > 1 � higher chance of success
Maximum LikelihoodMaximum Likelihood
• The likelihood for a given set of parameters
(QTL position and QTL effect) given the
observed data on phenotypes and marker
genotypes
• The estimates for the parameters are those
where the likelihood are highest
• Expectation-maximization(EM) method is
used in the estimation procedure
Maximum LikelihoodMaximum Likelihood
• A test statistic for this method is:
model)hood(fullMax_Likeli
model)edhood(reducMax_Likeliln2−=LR
The reduced model refers to the null-
hypothesis of no QTL effect.
• The LOD score for a QTL at position c is:
4.61
LR(c)
2ln10
LR(c)LOD(c) ==
HaleyHaley--Knott (HK) RegressionKnott (HK) Regression
• For two markers, the model is:
exy +α+µ=
where y is the observed phenotype
x is the P(Q|mg1,mg2,r1,r12)
HK RegressionHK Regression
• For each QTL position, the residual sums of
squares (SSE) is determined.
• The estimate of the QTL position is where
the SSE is the minimum.
• Estimates an approximate likelihood ratio:
====
full
reduced
SSE
SSEnLR ln
Extended HK RegressionExtended HK Regression
• An improvement of the HK regression
• Correct variance for each genotype is
being used instead of a constant
variance used in the HK regression
Which IM method to useWhich IM method to use
• ML provides better estimates but analysis is complex and computationally expensive
• HK regression is computationally faster but estimate of the residual variance is biased and the power of QTL detection may be affected (Kao et al 1999)
• Extended HK regression is not as fast as HK but provides improved approximations and still faster than ML
• Results are hardly different in practical mapping
Multiple Imputation MethodMultiple Imputation Method
• Another method available for IM
• Fills in all missing genotype data then uses single marker ANOVA to identify significant QTLS
• More robust than ML but has little advantage over the extended HK for single QTL mapping
• Intensive in both computation time and memory use
Interval MappingInterval Mapping
• Advantageso Takes proper account of missing data
o Allows examination of positions between markers
o Gives improved estimates of QTL effects
• Disadvantageso Increased computation time
o Requires specialized software
o Difficult to generalize
o Only considers one QTL at a time
IM sample outputIM sample output
Red – EM
Blue - EHK
R/R/qtlqtl
Interval Mapping
EM, HK, and EHK
Interval mappingInterval mapping
• Maximum likelihood
Permutation test can also be used to get
threshold value for lod scores.
calc.genoprobcalc.genoprob()()
• Calculate QTL probabilities conditional
on the available marker data.
• Needed in most mapping functions
o step – indicates step size in cM at which the
probabilities are to be calculated
o error.prob – assumed genotyping error rate
Note: genotyping error occurs when the
observed genotype of an individual does not
correspond to the true genotype.
Interval mappingInterval mapping
• Extended Haley-Knott Regression
Permutation test can also be used to get
threshold value for lod scores.
Combining IM resultsCombining IM results
Plot of combined resultsPlot of combined results
red – em
blue - ehk
Composite Interval MappingComposite Interval Mapping
• Performs interval mapping using a
subset of marker loci as covariates
• Markers serve as proxies for other
QTLs to account for linked QTLs and
reduce residuals
• Gives greater power in identifying key
QTL.
• More statistically complicated and
requires more computational power.
Steps in CIMSteps in CIM
• Selects a set of markers to serve as covariates.
• Performs interval mapping with these markers as covariates.
• Excludes markers at a fixed distance from the test position.
• Calculates a LOD score comparing the model with the putative QTL in the presence of covariates to the model with just the covariates.
Sample CIM outputSample CIM output
Blue – EM
Red - CIM
Problem with CIMProblem with CIM
• The estimated position of the first QTL
can be influenced by the second QTL
and vice versa, especially for linked
QTLs.
• The choice of covariates is critical: if
too many or too few markers are
chosen there will be a loss of power to
detect QTL.
R/R/qtlqtl
Composite Interval Mapping
cimcim()()
• cim(cross, pheno.col=1, n.marcovar=3, method=c("em", "imp", "hk", "ehk"), imp.method=c("imp", "argmax"), error.prob=0.0001, n.perm, window=10)o n.marcovar - number of marker covariates to use
o imp.method - method used to impute any missing marker genotype data
o window – marker covariates will be omitted this distance from the test postion
• add.cim.covar - Add dots at the locations of the selected marker covariates, for a plot of composite interval mapping results
Composite interval mappingComposite interval mapping
CIMCIM--Using permutation testUsing permutation test Composite interval mappingComposite interval mapping
blue – em
red – cim
Multiple QTL MappingMultiple QTL Mapping
• Extension of interval mapping to multiple QTLs
• Infer the location of QTLs to positions between markers
• Investigate interactions between QTLs(epistasis)
• More powerful and precise in detecting QTL (Kao et al 1999)
Sample Multiple QTL Mapping Sample Multiple QTL Mapping
outputoutput
Other Methods used in Interval Other Methods used in Interval
MappingMapping
• Bayesian Method – uses probability
theories in parameter estimations
based on prior knowledge about the
data (R/qtlbim)
• Mixed model regression – available
in R/ASReml
R/R/qtlqtl
Multiple QTL Mapping
Multiple QTL MappingMultiple QTL Mapping
• sim.geno() is used to impute genotypes with missing
data to minimize loss of information
• makeqtl() is used to create a qtl object. It pulls out the
imputed genotypes at the selected positions
• n.gen is the number of genotypes with imputed data
Multiple QTL MappingMultiple QTL Mapping
Displays the QTL on the genetic map
Multiple QTL MappingMultiple QTL Mapping
Not significant and may be
dropped from the model
Multiple QTL MappingMultiple QTL Mapping
Multiple QTL MappingMultiple QTL Mapping Multiple QTL MappingMultiple QTL Mapping
refineqtl() - Iteratively scan the positions for QTL in the
context of a multiple QTL model, to try to identify the
positions with maximum likelihood, for a fixed QTL model.
Multiple QTL MappingMultiple QTL Mapping
top related