4 interval mapping for a single qtlpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4...

30
ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 1 4 Interval Mapping for a Single QTL basic idea of interval mapping interval mapping by maximum likelihood – maximum likelihood using EM, MCMC Bayesian interval mapping – "natural" Bayesian priors – multiple imputation, MCMC bootstrapped variance estimates advantages & shortcomings of IM Haley-Knott regression approximation ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 2 4.1 basic idea of interval mapping study properties of likelihood at each possible QTL treating QTL as missing data assuming only a single QTL (for now) recall likelihood as mixture over unknown QTL likelihood = product of sum of products complicated to evaluate--requires iteration ) , | ( pr ) , | ( pr sum prod ) , , | ( pr prod ) , , | ( pr ) , | , ( θ λ λ θ λ θ λ θ Q Y X Q X Y X Y X Y L i i Q i i i i = = =

Upload: others

Post on 05-Jun-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 1

4 Interval Mapping for a Single QTL• basic idea of interval mapping• interval mapping by maximum likelihood

– maximum likelihood using EM, MCMC• Bayesian interval mapping

– "natural" Bayesian priors– multiple imputation, MCMC

• bootstrapped variance estimates• advantages & shortcomings of IM• Haley-Knott regression approximation

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 2

4.1 basic idea of interval mapping• study properties of likelihood at each possible QTL

– treating QTL as missing data– assuming only a single QTL (for now)

• recall likelihood as mixture over unknown QTL– likelihood = product of sum of products– complicated to evaluate--requires iteration

),|(pr),|(pr sum prod),,|(pr prod),,|(pr),|,(

θλλθλθλθ

QYXQXYXYXYL

iiQi

iii

==

=

Page 2: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 3

uncertainty in QTL genotype Q• how to improve guess on Q with data, parameters?

– prior recombination: pr(Q | Xi, λ )– posterior recombination: pr(Q | Yi, Xi,θ, λ )

• main philosophies for assessing likelihood– maximum likelihood: study peak(s)– Bayesian analysis: study whole shape

• implementation methodologies– Expectation-Maximization (EM)– Markov chain Monte Carlo (MCMC)– multiple imputation– genetic algorithms, GEE, …

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 4

posterior on QTL genotypes• full conditional of Q given data, parameters

– proportional to prior pr(Q | Xi, λ )• weight toward Q that agrees with flanking markers

– proportional to likelihood pr(Yi|Q,θ)• weight toward Q so that group mean GQ ≈ Yi

• phenotype and flanking markers may conflict– posterior recombination balances these two weights

),,|(pr),|(pr),|(pr),,,|(pr

λθθλλθ

ii

iiii XY

QYXQXYQ =

Page 3: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 5

how does phenotype Y affect Q?

90

100

110

120

D4Mit41D4Mit214

Genotype

bp

AAAA

ABAA

AAAB

ABAB

what are probabilitiesfor genotype Qbetween markers?

recombinants AA:AB

all 1:1 if ignore Yand if we use Y?

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 6

maximum likelihood (ML) idea• pick QTL locus λ (usually scan whole genome)• find ML estimates of gene action θ given λ• maximum likelihood at peak of likelihood

– slope (derivative with respect to θ) is zero – sometimes maximum is at a boundary (non-zero slope)

• slope is weighted average using posteriors for Q– cannot write estimate in "closed form"– need to know θ to estimate it!– iterate toward the maximum in some clever way

θθλθ

θλθ

dQYdXYQ

dXYdL i

iiQi)),|(prlog(),,,|(pr sum),|,(

,=

Page 4: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 7

Bayesian model posterior• augment data (Y,X) with unknowns Q

– study unknowns (θ,λ,Q) given data (Y,X)– Q ~ pr(Q | Yi, Xi,θ, λ )

• no longer need weighted average over Q– instead we average over Q to study parameters– pr(θ,λ|Y,X) = sumQ pr(θ,λ,Q|Y,X)

• study properties of posterior– need to specify priors for (θ,λ) – denominator is very difficult to compute in practice– drawing samples from posterior in some clever way

)|(pr)(pr)|(pr),|(pr),|(pr),|,,(pr

XYXQYXQXYQ θλθλλθ =

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 8

4.2 interval mapping by ML• search whole genome for putative QTL• "profile" likelihood across all possible λ

– find ML estimate of θ given λ– ML estimate of (θ,λ) at maximum over genome

=

=

=

)|ˆ(),|,ˆ(log)(

)ˆ,ˆ|(),|(prsum prod),|,ˆ(

),ˆ|(prod)|ˆ(

0010

2

200

pool

YLXYLLOD

GYfXQXYL

sYfYL

QiQi

ii

θλθλ

σλλθ

µθ

Page 5: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 9

LOD for hyper dataset

0

2

4

6

8

lod

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 X

19 + X chromosomeshighest LOD on 4other QTL?

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 10

LOD(λ) on chr 4 of hyper

0 20 40 60

0

2

4

6

8

Map position (cM)

lod

EMHKIMP

EM "exact"Haley-Knott regressionsingle imputation

all agree at the peakand mostly at markers

note marker spacing

Page 6: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 11

EM method for interval mapping• fix a possible QTL λ• iterate between expectation & maximization

– likelihood increases with each iteration– stop iterating when the change is "negligible"

• initial values– PQi = pr(Q | Xi,λ )

• recombination model in the absence of data– or use Haley-Knott regression estimates of θ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 12

EM method for interval mapping• E-step: estimate posterior recombination

– PQi = pr(Q | Yi, Xi,θ, λ )– estimate for every individual i, genotype Q– depends on effects θ

• M-steps: maximize likelihood for θ– may be many parameters– technical point: caution on parallel updates– solve system of equation: derivatives set to zero – depends on PQi

θθ

dQYdP i

QiQi)),|(prlog( sum0 ,=

Page 7: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 13

4.2.2 M-step for normal phenotype• Y = GQ + e, e ~ N(0,σ2)• pr(Y | Q,θ ) = f(Y |GQ, σ2)• see notes in book for derivative details• E-step estimates:

( ) nPGY

PPYG

QiQiQi

QiiQiiiQ

/ˆ sumˆ

sum/ sumˆ2

,2 −=

=

σ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 14

4.2.3 ML via MCMC• basic idea of simulated annealing

– start with non-informative priors on (θ,λ) – sample from posterior (somehow…)– gradually shrink priors toward ML estimate

• slight difficulty– need to know (θ,λ) to sample from posterior– iteration leads to Markov chain

• point of this section– MCMC does not imply a Bayesian perspective!

Page 8: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 15

4.3 Bayesian interval mapping• sample missing genotypes Q• decouple effects θ from QTL λ• but Q depends on (θ,λ) and vice versa• also need to specify priors

)|(pr)(pr),|(pr~

),,,|(pr~)|(pr

)|(pr),|(pr~

QYQY

XYQQXQ

XXQ

ii

θθθ

λθ

λλλ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 16

4.3.1 Bayesian priors for QTL• locus λ may be uniform over genome

– pr(λ | X ) = 1 / length of genome• missing genotypes Q

– pr(Q | X, λ )– recombination model is formally a prior

• effects θ = (G,σ2), G = (GQQ,GQq,Gqq)– conjugate priors for normal phenotype– GQ ~ N(µ, κσ2)– σ2 ~ inverse-χ2(ν,τ2), or ντ2 / σ2 ~ χ2

Page 9: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 17

effect of prior variance on posterior

normal prior, posterior for n = 1, posterior for n = 5 , true mean(solid black) (dotted blue) (dashed red) (green arrow)

2 3 4 5 6 7 8 9 11 13 15 17

κ = 0.5

2 3 4 5 6 7 8 9 11 13 15 17

κ = 1

2 3 4 5 6 7 8 9 11 13 15 17

κ = 22.0 κ=0.5

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 18

details of phenotype priors

• priors depend on "hyper-parameters"• GQ ~ N(µ, κσ2)

– center around phenotype grand mean– κσ2 ≈ σG

2 = genetic variance – κ ≈ σG

2 /σ2 = h2 / (1–h2)– h2 = σG

2 /(σG2 +σ2 ) = heritability

• σ2 ~ inverse-χ2(ν,τ2), or ντ2 / σ2 ~ χ2

– τ2 ≈ s2 = total sample variance– ν = prior degrees of freedom = small integer

Page 10: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 19

Y = G + E posterior for single individualenviron E ~ N(0,σ2), σ2 known likelihood pr(Y | G,σ2) = N(Y | G,σ2)prior pr(G | σ2,µ,κ) = N(G | µ,κσ2)posterior N(G | µ+B1(Y-µ), B1σ2)Yi = G + Ei posterior for sample of n individuals

shrinkage weights Bn go to 1

Bayes for normal data

11

,sum with

),(N),,,|pr(2

2

→+

==

−+=

nnB

nYY

nBYBGYG

ni

nn

κκ

σµµκµσ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 20

Y = GQ + E genetic Q = qq,Qq,QQenviron E ~ N(0,σ2), σ2 knownparameters θ = (G,σ2)

likelihood pr(Y | Q,G,σ2) = N(Y | GQ,σ2)prior pr(GQ | σ2,µ,κ) = N(GQ | µ,κσ2)posterior:

posterior by QT genetic value

11

,sum},{count

),(N),,,,|(pr

}:{

22

→+

====

−+=

=Q

QQ

Q

iQQiQiQ

QQQQQQ

nn

BnYYQQn

nBYBGQYG

i κκ

σµµκµσ

Page 11: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 21

How do we choose hyper-parameters µ,κ?Empirical Bayes: marginalize over prior

estimate µ,κ from marginal posterior

likelihood pr(Yi | Qi,G,σ2) = N(Yi | G(Qi),σ2)prior pr(GQ | σ2,µ,κ) = N(GQ | µ,κσ2)marginal pr(Yi | σ2,µ,κ) = N(Yi | µ, (κ +1)σ2)

estimates

EB posterior

Empirical Bayes: choosing hyper-parameters

−+=

−≈+=

−==

••

••

QQQQQQ

ii

nBYYBYGYG

hssnYYsY

2

2222

22

ˆ),(N)|(pr

)1/()1/(ˆ/)(sum,ˆ

σ

κσµ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 22

What if variance σ2 is unknown?• recall that sample variance is proportional to chi-square

– pr(s2 | σ2) = χ2 (ns2/σ2 | n)– or equivalently, ns2/σ2 | σ2 ~ χn

2

• conjugate prior is inverse chi-square– pr(σ2 | ν,τ2) = inv-χ2 (σ2 | ν,τ2 )– or equivalently, ντ2/σ2 | ν,τ2 ~ χν

2

– empirical choice: τ2= s2/3, ν=6• E(σ2 | ν,τ2) = s2/2, Var(σ2 | ν,τ2) = s4/4

• posterior given data– pr(σ2 | Y,ν,τ2) = inv-χ2 (σ2|ν+n, (ντ2+ns2)/(ν+n) )– weighted average of prior and data

Page 12: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 23

Yi = G(Qi) + Ei genetic Qi = qq,Qq,QQenviron E ~ N(0,σ2)parameters θ = (G,σ2)

likelihood pr(Yi | Qi,G,σ2) = N(Yi | G(Qi),σ2)prior pr(GQ | σ2,µ,κ) = N(GQ | µ, σ2/κ)

pr(σ2 | ν,τ2) = inv-χ2 (σ2 | ν,τ2 )posterior:

joint effects posterior details

( ) nQGYsn

nB

nns

nGQY

nBYYBYGQYG

iiiQQ

QQ

QQ

QQQQQQ

/)(sum, with

,-inv),,,,|(pr

),(N),,,,|(pr

22

222222

22

−=+

=

++

+=

−+= ⋅⋅

κ

νντ

νσχτνσ

σκµσ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 24

4.3.2 Bayesian multiple imputation• basic idea

– impute multiple copies of missing genotypes Q• sample Q ~ pr(Q | X, λ)• weighted to appear as draws from posterior

– average out gene effects θ– study posterior for putative QTL λ

• most effective for multiple QTL– use single QTL to introduce idea– consider all loci as possible QTL

• sample on grid Λ of `pseudomarkers' (every 2cM)• similar to interval map scan of whole genome

Page 13: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 25

importance sampling idea• draw samples from one distribution

– Q1, Q2, Q3,…, Qn ~ f(Q)• weight them appropriately by ω(Q)• sample summaries from distribution g(Q)

– g(Q)= f(Q)ω(Q) / constant– mean for f = sumi Qi / n– mean for g = sumi Qiω(Qi) / sumi ω(Qi)

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 26

example: mean copies of Qgenotype qq Qq QQ sumQ copies 0 1 2 true g 0.25 0.5 0.25 1.0draw f 1/3 1/3 1/3 1.0weight ω1 2 1f×ω 1/3 2/3 1/3 4/3importance sampling g = f×0.75ωsample 30 30 40 100mean f 0×30 1×30 2×40 110/100=1.1mean g 0×1×30 1×2×30 2×1×40 140/130=1.08

Page 14: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 27

what are appropriate weights?• ideally draw genotype from posterior

– want sample Q ~ g(Q) = sumθ pr(Q | Y,X,θ,λ)pr(θ )– but have sample Q ~ f(Q) = pr(Q | X,λ)

• appropriate weights – ω(Q,λ|Y,X) = pr(λ |X) sumθ pr(Y |Q,θ )pr(θ )

• estimate marginal posterior for QTL λ– draw N samples from prior at each QTL λ

Q1, Q2, Q3,…, QN ~ pr(Q | X,λ)pr(λ | Y,X) = sumQ ω(Q,λ|Y,X) pr(Q | X,λ) / constant

≈ sumj ω(Qj,λ|Y,X) / constant– constant is summed over all λ, but not actually needed

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 28

relating weights to posterior• posterior is simply averaged over θ• weights comprise terms except pr(Q | X,λ)• estimating weights: see Sen & Churchill

)|(pr/),|,(),|(pr)|(pr

)(pr),|(pr sum )|(pr),|(pr),|,,(pr sum),|,(pr

XYXYQXQXY

QYXXQXYQXYQ

λωλ

θθλλλθλ

θ

θ

=

=

=

Page 15: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 29

estimating effects via imputation• multiple imputation averages over effects• difficult to study posterior of effects directly• can estimate usual summaries

constant/),|,(),|( sum)|(pr/),|,(),|(pr ),|( sum

),|(pr ),|( sum),|(

,

,

XYQQYEXYXYQXQQYE

XYQQYEXYE

jjj

Q

Q

λωθλωλθ

θθ

λ

λ

=

=

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 30

4.3.3 Bayesian MCMC

• Markov chain Monte Carlo– Monte Carlo samples along a Markov chain

• What is a Markov chain?• What is MCMC?

– Sampling from full conditionals– Gibbs sampler, Metropolis-Hastings

Page 16: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 31

What is a Markov chain?• future given present is independent of past• update chain based on current value

– can make chain arbitrarily complicated– chain converges to stable pattern π() we wish to study

0 1 1-q

q

p

1-p

)/()1(pr qpp +=

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 32

Markov chain idea

0 1 1-qq

p

1-p

)/()1(pr qpp +=

0 5 10 15 20 25 30

01

stat

e

time

Page 17: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 33

Markov chain Monte Carlo• can study arbitrarily complex models

– need only specify how parameters affect each other– can reduce to specifying full conditionals

• construct Markov chain with “right” model– joint posterior of unknowns as limiting “stable” distribution– update unknowns given data and all other unknowns

• sample from full conditionals• cycle at random through all parameters

– next step depends only on current values• nice Markov chains have nice properties

– sample summaries make sense– consider almost as random sample from distribution– ergodic theorem and all that stuff

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 34

Markov chain Monte Carlo idea

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

θ

pr(θ

|Y)

have posterior pr(θ |Y)want to draw samples

propose θ ~ pr(θ |Y)(ideal: Gibbs sample)

propose new θ “nearby”accept if more probabletoss coin if less probablebased on relative heights(Metropolis-Hastings)

Page 18: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 35

MCMC realization

0 2 4 6 8 10

020

0040

0060

0080

0010

000

θ

mcm

c se

quen

ce

2 3 4 5 6 7 80.

00.

10.

20.

30.

40.

50.

pr(θ

|Y)

added twist: occasionally propose from whole domain

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 36

marginal posteriors

• joint posterior– pr(λ,Q,θ |Y,X) = pr(θ)pr(λ)pr(Q|X,λ)pr(Y|Q,θ) /constant

• genetic effects– pr(θ |Y,X) = sumQ pr(θ |Y,Q) pr(Q|Y,X)

• QTL locus– pr(λ |Y,X) = sumQ pr(λ |X,Q) pr(Q|Y,X)

• QTL genotypes more complicated– pr(Q|Y,X) = sumλ,θ pr(Q|Y,X,λ,θ ) pr(λ,θ |Y,X)– impossible to separate λ and θ in sum

observed X Y

missing Q

unknown λ θ

Page 19: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 37

Why not Ordinary Monte Carlo?• independent samples of joint distribution• chaining (or peeling) of effects

pr(θ|Y,Q)=pr(GQ | Y,Q,σ2) pr(σ2 | Y,Q) • possible analytically here given genotypes Q• Monte Carlo: draw N samples from posterior

– sample variance σ2

– sample genetic values GQ given variance σ2

• but we know markers X, not genotypes Q!– would have messy average over possible Q– pr(θ |Y,X) = sumQ pr(θ |Y,Q) pr(Q|Y,X)

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 38

MCMC Idea for QTLs• construct Markov chain around posterior

– want posterior as stable distribution of Markov chain– in practice, the chain tends toward stable distribution

• initial values may have low posterior probability• burn-in period to get chain mixing well

• update components from full conditionals– update effects θ given genotypes & traits– update locus λ given genotypes & marker map– update genotypes Q given traits, marker map, locus & effects

NQQQXYQQ

),,(),,(),,(),|,,(pr~),,(

21 θλθλθλθλθλ

→→→ L

Page 20: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 39

sample from full conditionals• hard to sample from joint posterior• update each unknown given all others• examine posterior: keep terms with unknown• normalizing denominator make a distribution

)|(pr)(pr),|(pr~

),,,|(pr~)|(pr

)|(pr),|(pr~

QYQY

XYQQXQ

XXQ

ii

θθθ

λθ

λλλ

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 40

sample from full conditionalsfor model with m QTL

• hard to sample from joint posterior– pr(λ,Q,θ |Y,X) = pr(θ)pr(λ)pr(Q|X,λ)pr(Y|Q,θ) /constant

• easy to sample parameters from full conditionals– full conditional for genetic effects

• pr(θ |Y,X,λ,Q) = pr(θ |Y,Q) = pr(θ) pr(Y|Q,θ) /constant

– full conditional for QTL locus• pr(λ |Y,X,θ,Q) = pr(λ |X,Q) = pr(λ) pr(Q|X,λ) /constant

– full conditional for QTL genotypes• pr(Q|Y,X,λ,θ ) = pr(Q|X,λ) pr(Y|Q,θ) /constant

observed X Y

missing Q

unknown λ θ

Page 21: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 41

Gibbs sampler idea• want to study two correlated normals• could sample directly from bivariate normal• Gibbs sampler:

– sample each from its full conditional– pick order of sampling at random– repeat N times

( )( )2

11212

222121

2

1

2

1

1),(~,,|

1),(~,,|

11

,~,|

ρµθρµρµθθρµθρµρµθθ

ρρ

µµ

ρµθθ

−−+

−−+

NN

N

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 42

Gibbs sampler samples: ρ = 0.6

0 10 20 30 40 50

-2-1

01

2

Markov chain index

Gib

bs: m

ean

1

0 10 20 30 40 50

-2-1

01

23

Markov chain index

Gib

bs: m

ean

2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs: m

ean

2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs: m

ean

2

0 50 100 150 200

-2-1

01

23

Markov chain index

Gib

bs: m

ean

1

0 50 100 150 200

-2-1

01

2

Markov chain index

Gib

bs: m

ean

2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs: m

ean

2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs: m

ean

2

N = 50 samples N = 200 samples

Page 22: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 43

Gibbs Sampler: effects & genotypes• for given locus λ, can sample effects θ and genotypes Q

– effects parameter vector θ = (G,σ2) with G=(Gqq,GQq,GQQ)– missing genotype vector Q = (Q1,Q2,…, Qn)

• Gibbs sampler: update one at a time via full conditionals– randomly select order of unknowns– update each given current values of all others, locus λ and data (Y,X)

• sample variance σ2 given Y,Q and genetic values G• sample genotype Qi given markers Xi and locus λ

– can do block updates if more efficient• sample all genetic values G given Y,Q and variance σ2

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 44

phenotype model: alternate form• genetic value GQ in “cell means” form easy• but often useful to model effects directly

– sort out additive and dominance effects– useful for reduced models with multiple QTL

• QTL main effects and interactions (pairwise, 3-way, etc.)

• we only consider additive effects here– Gqq = µ – a, GQq = µ, GQQ = µ + a

• recoding for regression model– Qi = –1 for genotype qq– Qi = 0 for genotype Qq– Qi = 1 for genotype QQ– G(Qi) = µ + aQi

Page 23: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 45

MCMC run of mean & additive

1

1

111

1

1

111

1

11

1

1

111

1

1

1

1111

1

1

1

11

1111

11

1

111

11

1

111

1

11

11

1

1

1

1

1

1

1

111

1

111111

1

111

1

11

1

1

1

1

1

11111

1

11111

111

1

1111

111111111

1

1

11

1

111111

11

1

1111

1

1

1

1

11

1

1111

1

11111

1111

1

11111111

11

1

1

11

111

1

11111111

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1111

1

111

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1111

1

11

11

1

1

1

1

1

11

1

11

1

1

1

1111111

111111

111

1

11111

1

1

1

1

111

11

111

1

111

11

1

1

111

1

11

1111

1

1111111

1

11

1

1111

11

1

1111

1

11

1111

1

1111111

111

11

1

1

1

1

1

1

1

111

1

11

1111

11

1

111

1

1

11

111

1

11

1

1

11

1

111

1

111111

11

111

11

111111

1

11111

111

11

11

1

1

1

11111

111

1

11

1

1

1

1

111

1

1

1

11

1

11

1

1

1

1

1

111

1

1

1

11

1

1

111

11

1111

1

1

1111

11

1

11

1

1

1

1

1

1

1

11

11

1

1

1

111

1

11

111111

1

1

1

111

1

111111111

1

11

1

11

1111

1

1111111111

1

1

11

1

1

111

1111

11

1

1

11

111

1

1

1

111

111

1

111

1

11

11

111

1

11

1

1

1111

11111

1

1

1

1

1

1

1

11

1

11

1

1

1

1

11

1

11

1

1

1

1

11

1

1

11

11

1

1

1

1

11

11

1

11

11

1

1

11

1

1

1

1

1

1111

111

11

1

1

1

11111111

1

111

11111

1

1

1

11

1

1

1

1

1

1

1

1

11

1

11

1

1

1111

11

1

11111

1

11

1

1

1

1

111

1

1

1

11

1

1

1

1

1

1

1

1

111

1

1111111

1

1

1

1111

1

1

1

1

111

111

11

1

1

1

1

1

1

1

1

1

111111

11

11

1

1

1

1

1111

1

1

1

11

1

1

1

1

1

1

111

1

11

1

11

1

1

111

1

11

1

11

1

1

11111

1

11

1

1

1

11

111

1

1

11

1

11

111

1

11

11

1

1

1

111

111

1

1111111

11

1

11

1

1111

1

111

1

1

1

1

11

1111

1

1

1

11

11

1

1

11

1

1

1

1

11

111

1

11

1

11

11

111

1111

1

11

1

1

11

1

1

1

1

1

1

11

11

1

1

1

1

1111

111

111

1111

1

111

111

1

1

1

1

111

1

11

111

1

11

1

111111

0 200 400 600 800 1000

9.6

9.8

10.0

10.2

10.4

MCMC run/100

9.6

9.8

10.0

10.2

10.4

0 20 40 60

mea

n

frequency

1

11

1111

1

11

111

1

1

111

111

1

1

111

11

1

1

1

1111

11

11

1

111

11

11

1

1

1

11111

11111

11

11

1

1

11

1

11

1

1

1

1

11

1

11

11

1

11

11

11

111

111

11

1

1

1

1

11111

11

1

1111

1

1

1111

1

11

1

11

1

1

111

1

111111

1

1

1

1

111

11

1

1

11

1

1

111

1

1

1

1

11

1

1

1

1

1

11

11

1

1

11

111

111

11

1

1

1

1

11

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

111

1

1

1

1

1

1

11

1

11

1

111

1

1

11

1

1

11

1

111

1

1

1

1

1

1

1111

1

1

1

11

1

1

1

1

11

1111

1

1

11

1

1

1

1111

1

1

1

1

1

11

11

1

1

1

1

11

1

1

11

1

11

1

1

111

11

1

1

1

1

1

1

1

1

1

11

11111

1

1

1

1

111

1

1

111

11

1

1

1

1

11

1

11

1

1111

1

11

1

1111

1111

1

1

1

1

1

1

111111

1

11

1

1

111

1

11

1

1

1

11

1

1111

11

1

1

1

1

1

1

1

1

1

11

1

11

111

1

1111

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

11

11

1

1

1

1

1

1

1

11

1

111

111

11111

1

1

1

1

1

111

1

1

1

1

1

1

1

1111111

1111

111

11

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

1

1

1

11

11111

1

11

11

11

1

1

111

11

11

1

1

11

1

111111

1

111

11111

1

111

1

1

1

1111

1

11

1

1

1

1

1

11111

1

1

1

1

1

1

1

11

1

1

1

11

1

1

11

11

1

1

1

1

11

1

1

1

11

11

11

1

11

1

1

11111111

1

1

11

1

1

1

11

1111

1

111

11

1

1

1

1111

1

1

1

1

1

1

1

1

1

11

1111

1

11

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1111

1

11

11

1

11

1

11

11111

1

111

1

1

1

1

11

11

1

11

1

111

1

11

1

111

111

1

111111

111111111

111

1

1

1

1

1

1

1

111

1

1

11

1111

1111

11

1111

1

1

1

1

1

1

1

1

111

11

11

1

1

1

1

1

11

11

111

1

1111

1

11

11111

11

1

11111

11

1

11

1

11

111

1

11

1

1

1

11

1

1

11

11111

1

1

11111

11

111

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

11111

11

11

1

1111

1

1

1

11

1

11

1

11

1

111

1

11

1

11111

111

1

111

1111

1

1

1

11

1

11

11

1

1

1

1

1

1

1

11

1

1

11

1

111

1

1

1

111

1

1

11

11

1

1

1

1

1

1

11

1

11

1

0 200 400 600 800 1000

1.8

2.0

2.2

MCMC run/1001.

82.

02.

20 20 40 60

addi

tive

frequency

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 46

MCMC run for variance

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

11

1

1

1

1

1

111

1

1

11

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

111

1

1

1

1

11

1

1

111

1

1

1

1

1

1

1

1

11

1

1

1

1

11

11

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

11

1

1

11

1

1

11

1

1

1

1

1

1

1

111

1

11

11

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1111

111

1

1

1

1

1

1

1111

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

11

1

11

11

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

11

11

1

11

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

11

11

11

1

1

1

1

1

1111

1

1

11

1

1

1

11

1

1

1

111

1

1

1

11

1

11

1

1

1

1

1

11

1

111

1

1

1

11

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

11

1

11

1

1

1

1

11

1

1

1

1

11

1

1

11

11

1

1

11

11

1

1

1

11

1

1

111

1

1

11

1111

1

11

11

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

111

1

11

11

1

1

1

1

11

1

11

11

1

11

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

11

11

1

1

1

1

1

1

11

1

11

1

1

11

1

11

1

11

1

1

1

11

1

111

1111

111

11

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

11

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

111

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

111

1

11

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

11

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

0 200 400 600 800 1000

1.0

1.2

1.4

1.6

1.8

2.0

MCMC run

1.0

1.2

1.4

1.6

1.8

2.0

0 10 20 30 40

varia

nce

frequency

Page 24: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 47

missing marker data• sample missing marker data a la QT genotypes• full conditional for missing markers depends on

– flanking markers– possible flanking QTL

• can explicitly decompose by individual i– binomial (or trinomial) probability

),,|(pr),,,,|(prAAor Aa aa,

λλθ iiikiiiik

ik

QXXQXYXX

==

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 48

Metropolis-Hastings idea• want to study distribution f(θ)• take Monte Carlo samples

– unless too complicated• Metropolis-Hastings samples:

– current sample value θ– propose new value θ*

• from some distribution g(θ,θ*)• Gibbs sampler: g(θ,θ*) = f(θ*)

– accept new value with prob A• Gibbs sampler: A = 1

=

),()(),()(,1min *

**

θθθθθθ

gfgfA

0 2 4 6 8 10

0.0

0.2

0.4

-4 -2 0 2 4

0.0

0.2

0.4

f(θ)

g(θ–θ*)

Page 25: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 49

Metropolis-Hastings samples

0 2 4 6 8

050

150

θ

mcm

c se

quen

ce

0 2 4 6 8

02

46

θ

pr(θ

|Y)

0 2 4 6 8

050

150

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.4

0.8

1.2

θ

pr(θ

|Y)

0 2 4 6 8

040

080

0

θ

mcm

c se

quen

ce0 2 4 6 8

0.0

1.0

2.0

θ

pr(θ

|Y)

0 2 4 6 8

040

080

0

θ

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.2

0.4

0.6

θ

pr(θ

|Y)

N = 200 samples N = 1000 samplesnarrow g wide g narrow g wide g

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 50

full conditional for locus• cannot easily sample from locus full conditional

pr(λ |Y,X,θ,Q) = pr(λ |X,Q)= pr(λ) pr(Q|X,λ) /constant

• cannot explicitly determine full conditional– difficult to normalize– need to average over all possible genotypes over entire map

• Gibbs sampler will not work– but can use method based on ratios of probabilities…

Page 26: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 51

Metropolis-Hastings Step• pick new locus based upon current locus

– propose new locus from distribution q( )• pick value near current one?• pick uniformly across genome?

– accept new locus with probability a()• Gibbs sampler is special case of M-H

– always accept new proposal• acceptance insures right stable distribution

– accept new proposal with probability A– otherwise stick with current value

=

),()|(),()|(,1min),( *

*

newoldold

oldnewnewnewold q

qAλλλπλλλπλλ

xx

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 52

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

11

1

11

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

111

1

1

1

1

1

1

1

111

1

11

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

11

1

1

1

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

11

1

1

1

1

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

111

11

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

111

1

1

1

11

1

11

1

1

11

1

1

1

1

1

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

111

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

11

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

11

111

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

11

1

11

1

1

1

11

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

111

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1111

1

1

1

1

111

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

11

1

1

11

1

1

1

1

1

1

1

11

1

11

1

1

11

1

1

1

11

1111

1

1

1

1

1

1

1

111

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1111

1

1

11

1

1

111

11

1

11

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

11

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

111

1

1

1

1

1

111

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

1

111

1

1

1

1

1

1

1

1

1

111

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

1

11

1

1

111

1

11

1

1

11

11

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

11

0 200 400 600 800 1000

3638

4042

MCMC run/100

3638

4042

0 20 40 60

dist

ance

(cM

)

frequency

MCMC Run for 1 locus at 40cM

Page 27: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 53

Care & Use of MCMC• sample chain for long run (100,000-1,000,000)

– longer for more complicated likelihoods– use diagnostic plots to assess “mixing”

• standard error of estimates– use histogram of posterior– compute variance of posterior--just another summary

• studying the Markov chain– Monte Carlo error of series (Geyer 1992)

• time series estimate based on lagged auto-covariances– convergence diagnostics for “proper mixing”

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 54

4.4 bootstrapped variance estimates

• (re)sample (Yi,Xi) with replacement– create bootstrap sample "new" data of size n– estimate loci λ and effects θ

• repeat this N times• construct summaries of these

– mean, variance, median, percentile• construct 95% confidence intervals for λ and θ

– (2.5%ile, 97.5%ile)– order estimates, pick number .025N and .975N

Page 28: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 55

4.5 advantages & shortcomings of IM

• advantages over single marker analysis– can infer position and effect of QTL– estimated locations & effects almost unbiased

• if only one segregating QTL per chromosome

– requires fewer individuals for detection of QTL

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 56

shortcomings of IM• not an interval test

– cannot say whether or not QTL is in an interval– not independent of effects of QTL outside interval

• can give false positives due to linkage– high LOD score due to nearby QTL– less of a problem for unlinked QTL

• can detect "ghost QTL"– higher peak between two linked QTL– estimated position and effect are biased

Page 29: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 57

4.6 Haley-Knott Regression Approximation• likelihood mixes over missing genotypes

– normal data → mixture of normals• approximate mixture by one normal

– just estimate mean and variance• advantages

– works well for closely spaced markers– mean is correct– can exploit flanking markers for missing data– calculations are easy and fast (PLABQTL)

• disadvantages– variance depends on marker genotypes and spacings– approximation errors accumulate for multiple QTL

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 58

Haley-Knott regression idea• replace missing genotypes Q by expected values

– Pi = E(Q | Xi,λ) = sumQ pr(Q | Xi,λ) Q• fit regression model (e.g. additive gene action)

– Yi = µ + αPi + ei , i = 1,…,n• assume constant variance• correct mean

– E(Yi |Xi, θ, λ) = Pi

• wrong variance– V (Yi |Xi, θ, λ) = σ2 sumQ [pr(Q | Xi,λ)]2

Page 30: 4 Interval Mapping for a Single QTLpages.stat.wisc.edu/~yandell/statgen/course/notes/course4.pdf4 Interval Mapping for a Single QTL • basic idea of interval mapping • interval

ch. 4 © 2003 Broman, Churchill, Yandell, Zeng 59

Haley-Knott and EM

• both use expected value of genotypes Q– HK: Pi = E(Q | Xi,λ) = prior expectation– EM: Pi = E(Q | Yi, Xi, θ, λ) = prior expectation

• both solve regression problems for effects• difference is in iteration

– HK is first step iteration– EM iterates E-step and M-steps to convergence