lecture 23: quantitative traits iii

41
Lecture 23: Quantitative Traits III Date: 11/12/02 Single locus backcross regression Single locus backcross likelihood F2 – regression, likelihood, etc

Upload: knox-mccarthy

Post on 03-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Lecture 23: Quantitative Traits III. Date: 11/12/02 Single locus backcross regression Single locus backcross likelihood F2 – regression, likelihood, etc. Backcross Model. m 1 is the genotypic value of QQ m 2 is the genotypic value of Qq. Backcross – t-test. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 23: Quantitative Traits III

Lecture 23: Quantitative Traits III

Date: 11/12/02 Single locus backcross regression Single locus backcross likelihood F2 – regression, likelihood, etc

Page 2: Lecture 23: Quantitative Traits III

Backcross Model

Marker Genotype Count

Marg. Freq.

QTL Genotype

Trait ValueQQ Qq

AA n1 0.5 1- 12

Aa n2 0.5 1- 1)2

1 is the genotypic value of QQ2 is the genotypic value of Qq

Page 3: Lecture 23: Quantitative Traits III

Backcross – t-test

classmarker each in means observed theare ˆ and ˆ where

variancepooled theis ˆ where

sampled sindividual ofnumber total theis where

2)df(~11

ˆ

ˆˆ

2

21

21

2

AaAA

AaAAM

s

nnN

Nt

nns

t

Gen. Freq. Phen.

AA n1 X11,X12,...,X1n1

Aa n2 X21,X22,..., X2n2

Page 4: Lecture 23: Quantitative Traits III

Backcross – Linear Regression (BLG)

One may also test the data using a simple linear regression model.

Where yj is the trait value for the jth individual, xj is a dummy variable indicating marker genotype (AA or Aa).

You know that estimates of the coefficients are given by:

We seek the expectation of these coefficients under a genetic model.

jjj xy

x

yxb

xbya

Var

,Covˆ

ˆ

Page 5: Lecture 23: Quantitative Traits III

BLG – Expected Sample Statistics

To find the expected values under the genetic model, we need the expectation of the sample means and variances:

2121212

212121

222

212

111

2

111

2

1ˆE

2

11

2

11

2

1E

112

11

2

1ˆE

012

11

2

1E

xy

x

s

y

s

x

Page 6: Lecture 23: Quantitative Traits III

BLG – Expected Coefficients

Recalling the coefficient estimators:

Finally, recalling our genetic models:

212

21

212

1ˆE

ˆEE

2

1EE

x

xy

s

sb

ya

QQ a 2a

Qq d (1+k)a

qq -a 0

Page 7: Lecture 23: Quantitative Traits III

BLG – Hypothesis Testing

We conclude that the expected regression coefficient is:

So, again, rejecting H0: =0 means =0.5 (NO LINKAGE) a=0 or a=d=0 (NO VARIATION) k=1 or a=d (COMPLETE DOMINANCE)

ak

dab

1212

1

212

1

212

1E 21

Page 8: Lecture 23: Quantitative Traits III

Backcross – Likelihood (BL)

One may also set up a likelihood function for backcross progeny.

Trait values are assumed approximately normal (lots of little effects added together).

The distribution of trait values for each marker class are assumed to be a mixture of two normals, one for each possible genotype at the QTL.

The mixing proportions are determined by the recombination fraction.

Page 9: Lecture 23: Quantitative Traits III

Genotypic Value

BL – Distributions Class AA

QQ

Qq

21 AA

80%

20%

Suppose =0.2

Page 10: Lecture 23: Quantitative Traits III

BL – Distributions Class Aa

Genotypic Value

QQ

Qq

21 Aa

80%

20%

Page 11: Lecture 23: Quantitative Traits III

BL – Assumptions

Assume the trait variances for the two QTL genotypes in the backcross are equal.

Assume the traits are normally distributed. Assume there is no marker / trait interaction,

so the distributions remain unchanged in both marker classes (i.e. same variances).

Page 12: Lecture 23: Quantitative Traits III

BL – Likelihood

The likelihood function for the backcross is then:

where Qj is one of the (unknown) two possible genotypes at the marker locus.

N

i j

jiijN

yMQL

1

2

12

2

2expP

2

1

Page 13: Lecture 23: Quantitative Traits III

BL – Log Likelihood

Take the log of the likelihood to obtain:

N

i j

jiij

NyMQl

1

22

12

2

2log22

expPlog

Page 14: Lecture 23: Quantitative Traits III

BL – Null Hypothesis A

One null hypothesis of interest is that the mean genotypic values for the two distributions are not in fact different, so

H0: 1 = 2 = .

In this case, the log likelihood becomes:

2

1

2

2

1

22

2

2log22

1

2log22

explog

Ny

Nyl

N

ii

N

i

i

Page 15: Lecture 23: Quantitative Traits III

BL – Null Hypothesis B

Another, perhaps more interesting null hypothesis, is that there is no linkage, so

H0: =0.5

Under this assumption, the log likelihood becomes

N

i j

ji Nyl

1

22

12

2

2log22

explog

Page 16: Lecture 23: Quantitative Traits III

BL – Statistical Test

The G statistic that is commonly calculated to test for linkage is:

However, this test is less powerful than the t test introduced earlier.

21df

221

221 ~5.0,ˆ,ˆ,ˆˆ,ˆ,ˆ,ˆ2 rlrlG

Page 17: Lecture 23: Quantitative Traits III

BL – LOD Scores

Again, LOD scores are commonly used for QTL detection.

Where, we interpret, as usual, that a lod score of l means the alternative hypothesis is 10l times as likely as the null hypothesis.

5.0,ˆ,ˆ,ˆlogˆ,ˆ,ˆ,ˆloglod 22110

22110 LL

Page 18: Lecture 23: Quantitative Traits III

BL – Likelihood Maximization

Analytic solutions are difficult to achieve. Iterative approaches are generally used (EM,

NR). Combinations of methods are also used. For

example, the variance is commonly estimated with the pooled variance:

2

222

1

2112ˆ

n

yy

n

yy jj

Page 19: Lecture 23: Quantitative Traits III

To facilitate calculations even more, a grid of values with maximization on 1 and 2 can be used.

So suppose you have multiple markers with known map position. Then, evaluate a G statistic or lod score for 3 possible locations of the QTL:

BL – Likelihood Maximization

Marker 0 0.25m 0.5m

1 =0 =f(0.25m12) =f(0.5m12)

2

Page 20: Lecture 23: Quantitative Traits III

BL – Sample Results

0

1

2

3

4

5

6

7

0 0.5 1 1.5 2

Chromosome Location

LO

D S

core

Page 21: Lecture 23: Quantitative Traits III

BL – Caveats

When there is more than one QTL in the same vicinity, the peaks in the LOD score plot may not correspond to QTLs.

Recall that these results are still based on single-locus analysis for which we cannot separate genetic effect from linkage. Thus, there is little good information about QTL location in such a plot, even though it looks like there should be.

Page 22: Lecture 23: Quantitative Traits III

BL – Comments

Note, that if marker density is high, then there is no need to evaluate at multiple levels of for each marker.

However, when marker density is low, information is gained when multiple QTL locations are considered.

When =0 is assumed, the estimates of 1 and 2 are simple means.

Page 23: Lecture 23: Quantitative Traits III

Single Marker F2 (F2)

There are now three possible genotypes to consider for both the marker and the QTL locus.

ni

Marg. Freq.

P(Qj | Mi)

QQ Qq qq

AA n1 0.25 (1-)2 2(1-) 2

Aa n2 0.50 (1-) (1-)2+2 (1-)

aa n3 0.25 2 2(1-) (1-)2

Page 24: Lecture 23: Quantitative Traits III

F2 – Expected Trait Values

ni

Marg. Freq. Expected Trait Value

AA n1 0.25

Aa n2 0.50

aa n3 0.25

da

adaAA

1221

121 22

QQQqqq

a-a d

d

adaAa

22

22

1

111

da

adaaa

1221

112 22

Page 25: Lecture 23: Quantitative Traits III

F2 – Dominant Marker

Similar tables can be derived for the case of a dominant marker.

In general, the procedure is as follows: Derive the QTL genotype probabilities

conditional on the marker phenotype. Using the conditional probabilities, derive the

expected trait value for each marker phenotype class.

Page 26: Lecture 23: Quantitative Traits III

F2 – Regression (F2R)

The regression model is

where yj is the trait value of the jth individual in the population

where x1j is the dummy variable for marker additive effect taking on value 1 for AA, 0 for Aa, and –1 for aa.

where x2j is the dummy variable for marker dominance effect taking on value 1 for AA and –1 for Aa and 1 for aa.

jjjj xxy 22110

Page 27: Lecture 23: Quantitative Traits III

F2R – Matrix Notation

XYXX

XYXX1'ˆ

'

2312

31

321

3

1

0

2215.0

215.0

225.0

100

05.00

001

Page 28: Lecture 23: Quantitative Traits III

F2R – Expected Coefficients

The coefficient estimates have expectation:

d

a

21

212

225.0

2215.0

215.0

225.0

100

020

001

ˆ

ˆ

ˆ

E

321

2312

31

321

3

1

0

Page 29: Lecture 23: Quantitative Traits III

F2R – F Statistics

The F statistic is the ratio between the residual mean squares for the reduced model and the full model.

The full model has residual mean square:

XYXYS '2full

Page 30: Lecture 23: Quantitative Traits III

F2R – Reduced Models

Reduced models of interest are:

And the F statistics are:

20,0021

202201

201102

21

1

2

0,0

0

0

Sy

Sxy

Sxy

jj

jjj

jjj

3,2df

3,1df

3,1df

2full

20,0

0,0

2full

20

0

2full

20

0

21

21

1

1

2

2

NS

SF

NS

SF

NS

SF

Page 31: Lecture 23: Quantitative Traits III

F2R – Dominant Marker

If the marker locus segregates as a dominant trait, then:

Thus, significant regression coefficient tests for a confounded additive effect, dominance effect, and linkage.

jjj xy 10

da 21 21212

3

1E

Page 32: Lecture 23: Quantitative Traits III

F2 – Likelihood Approach (F2L)

Assume trait variances for the three QTL genotypes are equal.

For each marker class, the trait value is a mixture of three normal distributions with different means, equal variances, and expected proportions based on degree of linkage.

The expected proportions are given in slide #23.

Page 33: Lecture 23: Quantitative Traits III

F2L – Log Likelihood

The likelihood then becomes a sum over three normals:

N

i j

jiijNF

yMQL

1

3

12

2

2 2expP

2

1

N

i j

jiijF

NyMQl

1

23

12

2

2 2log22

expPlog

Page 34: Lecture 23: Quantitative Traits III

F2L – Null Hypothesis A

If the null hypothesis is

H0: a = 0

2

1

2

22

2

2

21

31

2

2log2

2expP

2expPP

log0

N

yMQ

yMQMQ

alN

ii

i

iii

F

Page 35: Lecture 23: Quantitative Traits III

F2L – Null Hypothesis B

Suppose instead that the null hypothesis is

H0: d = 0

2

1

2

23

3

2

2

31

2

2

21

1

2 2log2

2expP

221

expP

2expP

log0

N

yMQ

yMQ

yMQ

dlN

i

ii

i

i

ii

F

Page 36: Lecture 23: Quantitative Traits III

F2L – Null Hypothesis C

Suppose instead that the null hypothesis is

H0: a = 0, d = 0

N

iiF

Nydal

1

22

22 2log22

10,0

Page 37: Lecture 23: Quantitative Traits III

F2L – Null Hypothesis D

When the null hypothesis is

H0: = 0.5

2

1

2

23

2

22

2

21

2 2log2

2exp

4

1

2exp

2

1

2exp

4

1

log5.0

N

y

y

y

lN

i

i

i

i

F

Page 38: Lecture 23: Quantitative Traits III

F2L – Statistical Test

The G statistic

21df2

3212

23212 ~

5.0ˆ,ˆ,ˆ,ˆ,ˆ

ˆ,ˆ,ˆ,ˆ,ˆ2

F

F

l

lG

Page 39: Lecture 23: Quantitative Traits III

F2L – Maximization

Iterative methods are required to find the maximum likelihood estimates.

Other approaches have been suggested, such as combining moment estimation with maximum likelihood approach. The resulting system of equations to solve for the estimators is given on the next slide.

Page 40: Lecture 23: Quantitative Traits III

F2L – Finding MLEs

2

322

22

1222

23

22

21

22

23

222

21

222

32

212

3222

1

32

212

112

11211

121

112

111

121

AAAAAAaa

AAAAAAAa

AAAAAAAA

aa

Aa

AA

mmmS

mmmS

mmmS

m

m

m

Page 41: Lecture 23: Quantitative Traits III

F2L – Dominant Marker Model

Modify the likelihood equations with QTL genotypes probabilities conditional on the marker genotype for a dominant marker.

Modify the expected trait values for each marker genotype.

Done.