october 2005jax workshop © brian s. yandell1 bayesian model selection for multiple qtl brian s....

66
October 2005 Jax Workshop © Brian S. Y andell 1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin- Madison www.stat.wisc.edu/~yandell/ statgenJackson Laboratory, October 2005

Upload: curtis-york

Post on 14-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 1

Bayesian Model Selectionfor Multiple QTL

Brian S. YandellUniversity of Wisconsin-Madison

www.stat.wisc.edu/~yandell/statgen↑

Jackson Laboratory, October 2005

Page 2: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 2

outline

1. what is the goal of QTL study?

2. Bayesian priors & posteriors

3. model search using MCMC• Gibbs sampler and Metropolis-Hastings

4. model assessment• Bayes factors & model averaging

5. data examples in detail• plants & animals

Page 3: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 3

0 5 10 15 20 25 30

01

23

rank order of QTL

addi

tive

eff

ect

Pareto diagram of QTL effects

54

3

2

1

major QTL onlinkage map

majorQTL

minorQTL

polygenes

1. what is the goal of QTL study?

Page 4: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 4

interval mapping basics• observed measurements

– y = phenotypic trait– m = markers & linkage map– i = individual index (1,…,n)

• missing data– missing marker data– q = QT genotypes

• alleles QQ, Qq, or qq at locus

• unknown quantities = QT locus (or loci) = phenotype model parameters– H = QTL model/genetic architecture

• pr(q|m,,H) genotype model– grounded by linkage map, experimental cross– recombination yields multinomial for q given m

• pr(y|q,,H) phenotype model– distribution shape (assumed normal here) – unknown parameters (could be non-parametric)

observed X Y

missing Q

unknown after

Sen Churchill (2001)

y

q

m

Page 5: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 5

2. Bayesian priors & posteriors• augment data (y,m) with missing genotypes q• study model parameters (,) given data (y,m,q)

– properties of posterior pr(,,q | y,m )– average posterior over missing genotypes q

• pr(, | y,m ) = sumq posterior

• sample from posterior in some clever way– multiple imputation (Sen Churchill 2002)– Markov chain Monte Carlo (MCMC)

posterior = likelihood * prior / constant

)|(pr

)|(pr)(pr),|(pr),|(pr),|,,(pr

my

mmqqymyq

Page 6: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 6

Bayes posterior vs. maximum likelihood

• classical approach maximizes likelihood• Bayesian posterior averages over other parameters

qmqqymy

Cdmym

cmy

),|(pr),|(pr),,|(pr

})(pr),,|(pr)|(pr{log)(LPD

)},,|(pr{maxlog)(LOD

10

10

Page 7: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 7

0 50 100 150 200

0

2

4

6

8

10

12

Map position (cM)

lod

15

50

10

00

LOD and LPD: QTL at 100

0 50 100 150 200

-0.5

0.0

0.5

1.0

1.5

2.0

Map position (cM)

sub

stitu

tion

effe

ct

substitution effect

BC with 1 QTL: IM vs. BIM

black=BIMpurple=IM

2nd QTL? 2nd QTL?

blue=ideal

Page 8: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 8

how does phenotype y improve posterior for genotype q?

90

100

110

120

D4Mit41D4Mit214

Genotype

bp

AAAA

ABAA

AAAB

ABAB

what are probabilitiesfor genotype qbetween markers?

recombinants AA:AB

all 1:1 if ignore yand if we use y?

Page 9: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 9

posterior on QTL genotypes• full conditional of q given data, parameters

– proportional to prior pr(q | m, )• weight toward q that agrees with flanking markers

– proportional to likelihood pr(y|q,)• weight toward q with similar phenotype values

• phenotype and flanking markers may conflict– posterior recombination balances these two weights

• this is the E-step of EM computations

),,|(pr

),|(pr),|(pr),,,|(pr

my

mqqymyq

Page 10: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 10

prior & posteriors: genotypic means q

6 8 10 12 14 16

y = phenotype values

n la

rge

n la

rge

n s

ma

llp

rio

r

qq Qq QQ

data meandata means prior mean

Page 11: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 11

shrink posterior from sample genotypic mean toward overall meanpartition individuals into sets Sq by genotype qhyperparameter is related to heritability

prior:

posterior given data:

shrinkage factor:

prior & posteriors: genotypic means q

11

}{count,sum

,)1(~

,~

2

2

q

qq

qqq

Sq

qqqqqq

q

n

nb

Snn

yy

nbybybN

yN

q

Page 12: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 12

what if variance 2 is unknown?• sample variance is proportional to chi-square

– ns2 / 2 ~ 2 ( n )– likelihood of sample variance s2 given n, 2

• conjugate prior is inverse chi-square 2 / 2 ~ 2 ( )– prior of population variance 2 given , 2

• posterior is weighted average of likelihood and prior– (2+ns2) / 2 ~ 2 ( +n )– posterior of population variance 2 given n, s2, , 2

• empirical choice of hyper-parameters 2= s2/3, =6– E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4

Page 13: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 13

• phenotype affected by genotype & environmentpr(y|q,) ~ N(q , 2)

y = q + environment

q = 0 + sum{j in H} j(q)

number of terms in QTL model H 2nqtl (3nqtl for F2)

• partition genotypic mean into QTL effectsq = 0 + 1(q1) + 2(q2) + 12(q1,q2)

q = mean + main effects + epistatic interactions

• partition prior and posterior (details omitted)

multiple QTL phenotype model

Page 14: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 14

prior & posterior on QTL model H

• index model H by number of QTL• what prior on number of QTL?

– uniform over some range

– Poisson with prior mean

– geometric with prior mean

• prior influences posterior– good: reflects prior belief

• push data in discovery process

– bad: skeptic revolts!• “answer” depends on “guess”

e

e

e

e

ee

e e e e e

0 2 4 6 8 100.

000.

100.

200.

30

p

p

p p

p

p

pp

p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pr

obab

ility

epu

exponentialPoissonuniform

Page 15: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 15

epistatic interactions

• model space issues– 2-QTL interactions only?– Fisher-Cockerham partition vs. tree-structured?– general interactions among multiple QTL

• model search issues– epistasis between significant QTL

• check all possible pairs when QTL included?• allow higher order epistasis?

– epistasis with non-significant QTL• whole genome paired with each significant QTL?• pairs of non-significant QTL?

• Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi (2004)

Page 16: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 16

3. QTL Model Search using MCMC• construct Markov chain around posterior

– want posterior as stable distribution of Markov chain– in practice, the chain tends toward stable distribution

• initial values may have low posterior probability• burn-in period to get chain mixing well

• update QTL model components from full conditionals– update locus given q,H (using Metropolis-Hastings step)– update genotypes q given ,,y,H (using Gibbs sampler)– update effects given q,y,H (using Gibbs sampler)– update QTL model H given ,,y,q (using Gibbs or M-H)

NHqHqHq

XYHqHq

),,,(),,,(),,,(

),|,,,(pr~),,,(

21

Page 17: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 17

Gibbs sampler idea• two correlated normals (genotypic means in BC)

– could draw samples from both together– but easier to sample one at a time

• Gibbs sampler:– sample each from its full conditional– pick order of sampling at random– repeat N times

2

QQQQQq

2QqQqQQ

QqQQQqQQ

1,~given

1,~given

),(but )1,0(~,

N

N

corN

Page 18: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 18

Gibbs sampler samples: = 0.6

0 10 20 30 40 50

-2-1

01

2

Markov chain index

Gib

bs:

mea

n 1

0 10 20 30 40 50

-2-1

01

23

Markov chain index

Gib

bs:

mea

n 2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs:

mea

n 2

-2 -1 0 1 2

-2-1

01

23

Gibbs: mean 1

Gib

bs:

mea

n 2

0 50 100 150 200

-2-1

01

23

Markov chain index

Gib

bs:

mea

n 1

0 50 100 150 200

-2-1

01

2

Markov chain index

Gib

bs:

mea

n 2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs:

mea

n 2

-2 -1 0 1 2 3

-2-1

01

2

Gibbs: mean 1

Gib

bs:

mea

n 2

N = 50 samples N = 200 samples

Page 19: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 19

How to sample a locus ?• cannot easily sample from locus full conditional

pr( | m,q) = pr() pr(q | m,) / constant• constant determined by averaging

– over all possible genotypes

– over entire map

• Gibbs sampler will not work in general– but can use method based on ratios of probabilities

– Metropolis-Hastings is extension of Gibbs sampler

Page 20: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 20

Metropolis-Hastings idea• want to study distribution f()• take Monte Carlo samples

– unless too complicated• pick an arbitrary distribution g(, *) • draw Metropolis-Hastings samples:

– current sample value – propose new value * from g(, *)– accept new value with prob A

• Gibbs sampler is special case– g(, *) = f(*)– always accept new proposal: A = 1

),()(

),()(,1min

*

**

gf

gfA

0 2 4 6 8 10

0.0

0.2

0.4

-4 -2 0 2 4

0.0

0.2

0.4

f()

g(–*)

Page 21: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 21

MCMC realization

0 2 4 6 8 10

020

0040

0060

0080

0010

000

mcm

c se

quen

ce

2 3 4 5 6 7 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

pr(

|Y)

added twist: occasionally propose from whole domain

Page 22: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 22

Metropolis-Hastings samples

0 2 4 6 8

050

150

mcm

c se

quen

ce

0 2 4 6 8

02

46

pr(

|Y)

0 2 4 6 8

050

150

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.4

0.8

1.2

pr(

|Y)

0 2 4 6 8

040

080

0

mcm

c se

quen

ce

0 2 4 6 8

0.0

1.0

2.0

pr(

|Y)

0 2 4 6 8

040

080

0

mcm

c se

quen

ce

0 2 4 6 8

0.0

0.2

0.4

0.6

pr(

|Y)

N = 200 samples N = 1000 samplesnarrow g wide g narrow g wide g

Page 23: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 23

sampling across QTL models H

action steps: draw one of three choices• update QTL model H with probability 1-b(H)-d(H)

– update current model using full conditionals– sample QTL loci, effects, and genotypes

• add a locus with probability b(H)– propose a new locus along genome– innovate new genotypes at locus and phenotype effect– decide whether to accept the “birth” of new locus

• drop a locus with probability d(H)– propose dropping one of existing loci– decide whether to accept the “death” of locus

0 L1 m+1 m2 …

Page 24: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 24

reversible jump MCMC

• consider known genotypes q at 2 known loci – models with 1 or 2 QTL

• M-H step between 1-QTL and 2-QTL models– model changes dimension (via careful bookkeeping)

– consider mixture over QTL models H

eqqYnqtl

eqYnqtl

)()(:2

)(:1

22110

110

Page 25: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 25

Gibbs sampler with loci indicators

• partition genome into intervals– at most one QTL per interval– interval = 1 cM in length– assume QTL in middle of interval

• use loci to indicate presence/absence of QTL in each interval = 1 if QTL in interval = 0 if no QTL

• Gibbs sampler on loci indicators– see work of Nengjun Yi (and earlier work of Ina Hoeschele)

eqqY )()(1221110

Page 26: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 26

Bayesian shrinkage estimation

• soft loci indicators– strength of evidence for j depends on variance of j

– similar to > 0 on grey scale • include all possible loci in model

– pseudo-markers at 1cM intervals• Wang et al. (2005 Genetics)

– Shizhong Xu group at U CA Riverside

chisquare-inverse~),,0(~)(

...)()(

22

12110

jjjjNq

eqqY

Page 27: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 27

0.05 0.10 0.15

0.0

0.05

0.10

0.15

b1

b2

a short sequence

-0.3 -0.1 0.1

0.0

0.1

0.2

0.3

0.4

first 1000 with m<3

b1

b2

-0.2 0.0 0.2

geometry of effect samples(collinearity of QTL yields correlated estimates)

1 1

2

2

Page 28: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 28

4. Model Assessment

• balance model fit against model complexity

smaller model bigger modelmodel fit miss key features fits betterprediction may be biased no biasinterpretation easier more complicatedparameters low variance high variance

• information criteria: penalize L by model size |H|– compare IC = – 2 log L( H | y ) + penalty( H )

• Bayes factors: balance posterior by prior choice– compare pr( data y | model H )

Page 29: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 29

QTL Bayes factors• BF = posterior odds / prior odds• BF equivalent to BIC

– simple comparison: 1 vs 2 QTL• same as LOD test

– general comparison of models– want Bayes factor >> 1

• nqtl = number of QTL– indexes model complexity– genetic architecture also important

)1(pr)data1(pr

)(pr)data(pr1 nqtl/|nqtl

nqtl/nqtl|BFnqtl,nqtl

e

e

e

e

ee

e e e e e

0 2 4 6 8 10

0.00

0.10

0.20

0.30

p

p

p p

p

p

pp

p p p

u u u u u u u

u u u u

m = number of QTL

prio

r pr

obab

ility

epu

exponentialPoissonuniform

Page 30: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 30

Bayes factors to assess models•Bayes factor: which model best supports the data?

– ratio of posterior odds to prior odds– ratio of model likelihoods

•equivalent to LR statistic when– comparing two nested models– simple hypotheses (e.g. 1 vs 2 QTL)

•Bayes Information Criteria (BIC)– Schwartz introduced for model selection in general settings– penalty to balance model size (p = number of parameters)

)log()()log(2)log(2

)model|(pr

)model|(pr

)model(pr/)model(pr

)|model(pr/)|model(pr

1212

2

1

21

2112

nppLRB

Y

YYYB

Page 31: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 31

simulations and data studies•simulated F2 intercross, 8 QTL

– (Stephens, Fisch 1998)– n=200, heritability = 50%– detected 3 QTL

•increase to detect all 8– n=500, heritability to 97%

QTL chr loci effect

1 1 11 –32 1 50 –53 3 62 +24 6 107 –35 6 152 +36 8 32 –47 8 54 +18 9 195 +2

7 8 9 10 11 12 13

010

2030

40

number of QTL

fre

qu

en

cy in

%0 50 100 150 200

Chr

omos

ome

ch1ch2ch3ch4ch5

ch6ch7ch8ch9

ch10

Genetic map

posterior

Page 32: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 32

loci pattern across genome• notice which chromosomes have persistent loci• best pattern found 42% of the time

Chromosome

m 1 2 3 4 5 6 7 8 9 10 Count of 8000

8 2 0 1 0 0 2 0 2 1 0 3371

9 3 0 1 0 0 2 0 2 1 0 751

7 2 0 1 0 0 2 0 1 1 0 377

9 2 0 1 0 0 2 0 2 1 0 218

9 2 0 1 0 0 3 0 2 1 0 218

9 2 0 1 0 0 2 0 2 2 0 198

Page 33: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 33

B. napus 8-week vernalizationwhole genome study

• 108 plants from double haploid– similar genetics to backcross: follow 1 gamete– parents are Major (biennial) and Stellar (annual)

• 300 markers across genome– 19 chromosomes– average 6cM between markers

• median 3.8cM, max 34cM– 83% markers genotyped

• phenotype is days to flowering– after 8 weeks of vernalization (cooling)– Stellar parent requires vernalization to flower

• available in R/bim package• Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002)

Page 34: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 34

Markov chain Monte Carlo sequence

burnin (sets up chain)mcmc sequence

number of QTLenvironmental varianceh2 = heritability(genetic/total variance)LOD = likelihood

-50000 -40000 -30000 -20000 -10000 0

05

1015

2025

burnin sequence

nqtl

-50000 -40000 -30000 -20000 -10000 00.

006

0.01

00.

014

burnin sequenceen

vvar

-50000 -40000 -30000 -20000 -10000 0

0.0

0.2

0.4

0.6

burnin sequence

herit

-50000 -40000 -30000 -20000 -10000 0

05

1015

20

burnin sequence

LOD

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06

24

68

10

mcmc sequence

nqtl

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06

0.00

40.

010

0.01

6

mcmc sequence

envv

ar

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06

0.1

0.3

0.5

mcmc sequence

herit

0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+065

1015

20mcmc sequence

LOD

Page 35: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 35

MCMC sampled loci

subset of chromosomes N2, N3, N16

points jittered for viewblue lines at markers

note concentrationon chromosome N2

includes all models

020

4060

8010

012

0

N2 N3 N16chromosome

MC

MC

sam

pled

loci

ec2e5a

E33M49.117

ec3b12wg2d11b

wg1g8a

E32M49.73wg3g11

ec3a8

wg7f3a

E33M59.59

E35M62.117wg6b10

wg8g1b

wg5a5

tg6a12

Aca1E38M50.133

E35M59.117

ec2d1a

wg1a10tg2h10b

tg2f12

ec3e12b

wg5a1bwg2a3c

ec5e12awg7f5bE32M47.159

tg5d9bslg6E35M59.85

E33M49.491E38M62.229wg1g10b

ec4h7

wg4d10

E32M50.409

E38M50.159

E33M47.338ec2d8a

E33M49.165wg4f4cE35M48.148wg6c6ec4g7bwg7a11

wg5b1aec4e7aE32M61.218

wg4a4bE33M50.183E33M62.140

wg9c7

wg6b2

E33M49.211wg2e12bisoIdh

wg3f7c

Page 36: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 36

Bayesian model assessment

row 1: # QTLrow 2: pattern

col 1: posteriorcol 2: Bayes factornote error bars on bf

evidence suggests 4-5 QTL N2(2-3),N3,N16

1 3 5 7 9 110.

00.

10.

20.

3number of QTL

QT

L po

ster

ior

QTL posterior

15

5050

0

number of QTL

post

erio

r /

prio

r

Bayes factor ratios

1 3 5 7 9 11

weak

moderate

strong

1 3 5 7 9 11 13

0.0

0.1

0.2

0.3

2*2

2:2,

123:

2*2,

122:

2,13

2:2,

32:

2,16

2:2,

113:

2*2,

32:

2,15

4:2*

2,3,

163:

2*2,

132:

2,14

2

model index

mod

el p

oste

rior

pattern posterior

5

e-01

1

e+01

5

e+02

model indexpo

ster

ior

/ pr

ior

Bayes factor ratios

1 3 5 7 9 11 13

12

23

2 22

23

24

32

weak

moderate

strong

Page 37: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 37

Bayesian estimates of loci & effectsmodel averaging: at least 4 QTL

histogram of lociblue line is densityred lines at estimates

estimate additive effects (red circles)grey points sampled from posteriorblue line is cubic splinedashed line for 2 SD

0 50 100 150 200 2500.

000.

020.

040.

06lo

ci h

isto

gram

napus8 summaries with pattern 1,1,2,3 and m 4

N2 N3 N16

0 50 100 150 200 250

-0.0

8-0

.02

0.02

addi

tive

N2 N3 N16

Page 38: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 38

Bayesian model diagnostics

pattern: N2(2),N3,N16col 1: densitycol 2: boxplots by m

environmental variance 2 = .008, = .09heritability h2 = 52%LOD = 16(highly significant)

but note change with m

0.004 0.006 0.008 0.010 0.012

050

100

200

dens

ity

marginal envvar, m 44 5 6 7 8 9 11 12

0.00

60.

008

0.01

0

envvar conditional on number of QTL

envv

ar

0.2 0.3 0.4 0.5 0.6 0.7

01

23

45

dens

ity

marginal heritability, m 44 5 6 7 8 9 11 12

0.30

0.40

0.50

0.60

heritability conditional on number of QTL

herit

abili

ty5 10 15 20 25

0.00

0.04

0.08

0.12

dens

ity

marginal LOD, m 44 5 6 7 8 9 11 12

1012

1416

1820

LOD conditional on number of QTLLO

D

Page 39: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 39

studying diabetes in an F2

• segregating cross of inbred lines– B6.ob x BTBR.ob F1 F2– selected mice with ob/ob alleles at leptin gene (chr 6)– measured and mapped body weight, insulin, glucose at various ages

(Stoehr et al. 2000 Diabetes)– sacrificed at 14 weeks, tissues preserved

• gene expression data– Affymetrix microarrays on parental strains, F1

• key tissues: adipose, liver, muscle, -cells• novel discoveries of differential expression (Nadler et al. 2000 PNAS; Lan et

al. 2002 in review; Ntambi et al. 2002 PNAS)– RT-PCR on 108 F2 mice liver tissues

• 15 genes, selected as important in diabetes pathways• SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase, PDI,

Page 40: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 40

0 50 100 150 200 250 300

02

46

8LO

D

chr2 chr5 chr9

0 50 100 150 200 250 300

-0.5

0.0

0.5

1.0

effe

ct (

add=

blue

, do

m=

red)

chr2 chr5 chr9

Multiple Interval Mapping (QTLCart)SCD1: multiple QTL plus epistasis!

Page 41: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 41

Bayesian model assessment:number of QTL for SCD1

1 2 3 4 5 6 7 8 9 11 13

0.00

0.05

0.10

0.15

0.20

0.25

number of QTL

QT

L po

ster

ior

QTL posterior

15

1050

500

number of QTL

post

erio

r /

prio

r

Bayes factor ratios

1 2 3 4 5 6 7 8 9 11 13

weak

moderate

strong

Page 42: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 42

Bayesian LOD and h2 for SCD1

0 5 10 15 20

0.00

0.10

dens

ity

marginal LOD, m 11 2 3 4 5 6 7 8 9 10 12 14

510

1520

LOD conditional on number of QTL

LOD

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

01

23

4

dens

ity

marginal heritability, m 11 2 3 4 5 6 7 8 9 10 12 14

0.1

0.3

0.5

heritability conditional on number of QTL

herit

abili

ty

Page 43: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 43

Bayesian model assessment:chromosome QTL pattern for SCD1

1 3 5 7 9 11 13 15

0.00

0.05

0.10

0.15

4:1,

2,2*

34:

1,2*

2,3

5:3*

1,2,

35:

2*1,

2,2*

35:

2*1,

2*2,

36:

3*1,

2,2*

36:

3*1,

2*2,

35:

1,2*

2,2*

36:

4*1,

2,3

6:2*

1,2*

2,2*

32:

1,3

3:2*

1,2

2:1,

2

3:1,

2,3

4:2*

1,2,

3

model index

mod

el p

oste

rior

pattern posterior

0.2

0.4

0.6

0.8

model index

post

erio

r /

prio

r

Bayes factor ratios

1 3 5 7 9 11 13 15

3 44 4

55 5

6 65

66

23

2

weak

moderate

Page 44: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 44

trans-acting QTL for SCD1(no epistasis yet: see Yi, Xu, Allison 2003)

dominance?

Page 45: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 45

2-D scan: assumes only 2 QTL!

epistasisLODpeaks

jointLODpeaks

Page 46: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 46

sub-peaks can be easily overlooked!

Page 47: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 47

epistatic model fit

Page 48: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 48

Cockerham epistatic effects

Page 49: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 49

marginal heritability by locus

black = totalblue = additivered = dominantpurple = add x add

Page 50: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 50

marginal LOD by locus

black = totalblue = additivered = dominantpurple = add x addgreen = scanone

Page 51: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 51

Bayesian & classical (HK)

Page 52: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 52

sex-adjusted anova for SCD1(only 3-way sex interactions significant)

Summary for fit QTL Method is: imp Number of observations: 107

Full model result---------------------------------- Model formula is: y ~ sex * (Q1 + Q2 + Q3 + Q1:Q3 + Q2:Q3)

df SS MS LOD %var Pvalue(Chi2) Pvalue(F)Model 29 109.02204 3.7593806 25.79574 67.05143 7.869261e-13 1.592058e-09Error 77 53.57261 0.6957482 Total 106 162.59465

Drop one QTL at a time ANOVA table: ---------------------------------- df Type III SS LOD %var F value Pvaluesex 15 36.369 12.039 22.368 3.485 0.000154Chr2@105 12 41.248 13.266 25.369 4.940 5.38e-06 Chr5@67 12 39.771 12.901 24.460 4.764 8.93e-06Chr9@15 20 59.542 17.365 36.620 4.279 1.87e-06 Chr2@105:Chr9@15 8 33.637 11.322 20.687 6.043 5.11e-06 Chr5@67:Chr9@15 8 30.042 10.344 18.476 5.397 2.12e-05sex:Chr2@105 6 18.499 6.892 11.377 4.431 0.000669sex:Chr5@67 6 18.130 6.773 11.151 4.343 0.000793sex:Chr9@15 10 25.945 9.176 15.957 3.729 0.000413 sex:Chr2@105:Chr9@15 4 17.444 6.549 10.729 6.268 0.000202sex:Chr5@67:Chr9@15 4 15.728 5.981 9.673 5.652 0.000483

Page 53: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 53

anova for SCD1(sex terms n.s.)

Model:y ~ Chr2@105 + Chr5@67 + Chr9@67 + Chr2@80 + Chr9@67^2 + Chr2@105:Chr9@67 + Chr5@67:Chr9@67^2

Df Sum Sq Mean Sq F value Pr(>F) Model 7 69.060 69.060 73.095 < 2.2e-16 ***Error 99 93.535 0.945 Total 106 162.595 70.005

Single term deletions Df Sum of Sq RSS AIC F value Pr(F) <none> 93.535 22.992 Chr2@80 1 9.073 102.608 28.225 9.6036 0.002528 **

Chr2@105 1 0.073 93.607 18.402 0.0769 0.782140 Chr5@67 1 0.218 93.753 18.568 0.2307 0.632084 Chr9@67 1 7.156 100.691 26.207 7.5746 0.007042 ** Chr9@67^2 1 0.106 93.641 18.440 0.1123 0.738200 Chr2@105:Chr9@67 1 15.612 109.147 34.836 16.5246 9.64e-05 ***Chr5@67:Chr9@67^2 1 7.211 100.745 26.265 7.6318 0.006838 **

Page 54: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 54

epistatic interaction plots(chr 5x9 p=0.007; chr 2x9 p<0.0001)

mostly axd mostly axa

Page 55: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 55

SCD1 on chr2x9 by sex

Page 56: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 56

obesity in CAST/Ei BC onto M16i

• 421 mice (Daniel Pomp)– (213 male, 208 female)

• 92 microsatellites on 19 chromosomes– 1214 cM map

• subcutaneous fat pads– pre-adjusted for sex and dam effects

• Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005) Genetics (in press)

Page 57: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 57

non-epistatic analysis

05

1015

20

LOD

sco

re

050

100

150

200

250

300

Baye

s fa

ctor

single QTL LOD profile multiple QTLBayes factor profile

Page 58: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 58

posterior profile of main effectsin epistatic analysis

-0.8

-0.4

0.0

0.2

0.4

Mai

n ef

fect

0.00

0.05

0.10

0.15

0.20

Her

itabi

lity

010

2030

4050

Bay

es fa

cto

r

main effects & heritability profile Bayes factor profile

Page 59: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 59

posterior profile of main effectsin epistatic analysis

Page 60: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 60

model selectionvia

Bayes factorsfor

epistatic model

number of QTL

QTL pattern

Page 61: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 61

posterior probability of effects

Posterior probability

0.0 0.2 0.4 0.6 0.8 1.0

Chr2(72,85)

Chr13(20,42)

Chr15(1,31)

Chr18(43,71)

Chr1(26,54)

Chr19(15,45)

Chr7(50,75)

Chr14(12,41)

Chr1(26,54)*Chr18(43,71)

Chr2(72,85)*Chr13(20,42)

Chr15(1,31)*Chr19(15,45)

Chr2(72,85)*Chr14(12,41)

Chr7(50,75)*Chr19(15,45)

Chr13(20,42)*Chr15(1,31)

Page 62: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 62

scatterplot estimates of epistatic loci

Page 63: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 63

stronger epistatic effects

Page 64: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 64

Bayesian software for QTLs• R/bim: Bayesian IM (Satagopan Yandell 1996; Gaffney 2001)*

– www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl– www.r-project.org contributed package to CRAN– version available within WinQTLCart (statgen.ncsu.edu/qtlcart)

• R/bmqtl: Bayesian Multiple QTL (N Yi, T Mehta & BS Yandell)*– epistasis, new MCMC algorithms, extensive graphics– extension of R/bim; due out Fall 2005

• R/qtl (Broman et al. 2003 Bioinformatics)*– biosun01.biostat.jhsph.edu/~kbroman/software– www.r-project.org contributed package

• Pseudomarker (Sen, Churchill 2002 Genetics)– www.jax.org/staff/churchill/labsite/software

• Bayesian QTL / Multimapper– Sillanpää Arjas (1998)– www.rni.helsinki.fi/~mjs

• Stephens & Fisch (1998 Biometrics)• R/bqtl (C Berry, hacuna.ucsd.edu/bqtl)

* Hao Wu, Jackson Labs, provide crucial computing support

Page 65: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 65

R/bim: our software• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl

– R contributed library (www.r-project.org)• library(bim) is cross-compatible with library(qtl)

– Bayesian module within WinQTLCart• WinQTLCart output can be processed using R library

• Software history– initially designed by JM Satagopan (1996)– major revision and extension by PJ Gaffney (2001)

• whole genome• multivariate update of effects; long range position updates• substantial improvements in speed, efficiency• pre-burnin: initial prior number of QTL very large

– upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 2003)– epistasis in progress (H Wu, BS Yandell, N Yi 2004)

Page 66: October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison yandell/statgen↑↑

October 2005 Jax Workshop © Brian S. Yandell 66

many thanks

Jackson Labs

Gary Churchill

Hao Wu

U AL Birmingham

David Allison

Nengjun Yi

Tapan Mehta

Michael Newton

Daniel Sorensen

Daniel Gianola

Liang Li

my studentsJaya Satagopan

Fei Zou

Patrick Gaffney

Chunfang Jin

Elias Chaibub Neto

USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi)

Tom Osborn

David Butruille

Marcio Ferrera

Josh Udahl

Pablo Quijada

Alan Attie

Jonathan Stoehr

Hong Lan

Susie Clee

Jessica Byers