power in qtl linkage analysis

Power in QTL linkage analysis

Shaun Purcell & Pak Sham

SGDP, IoP, London, UK

F:\pshaun\power.ppt

Power primer

Statistics (e.g. chi-squared, z-score) are continuous

measures of support for a certain hypothesis

Test statistic

YES OR NO decision-making : significance testingInevitably leads to two types of mistake :

false positive (YES instead of NO) (Type I)false negative (NO instead of YES) (Type II)

YESNO

Hypothesis testing

Null hypothesis : no effect

A ‘significant’ result means that we can reject the

null hypothesis

A ‘nonsignificant’ result means that we cannot

reject the null hypothesis

Statistical significance

The ‘p-value’

The probability of a false positive error if the null

were in fact true

Typically, we are willing to incorrectly reject the null

5% or 1% of the time (Type I error)

Misunderstandings

p - VALUES

that the p value is the probability of the null

hypothesis being true

that high p values mean large and important

effects

NULL HYPOTHESIS

that nonrejection of the null implies its truth

Limitations

IF A RESULT IS SIGNIFICANT

leads to the conclusion that the null is false

BUT, this may be trivial

IF A RESULT IS NONSIGNIFICANT

leads only to the conclusion that it cannot be

concluded that the null is false

Alternate hypothesis

Neyman & Pearson (1928)

ALTERNATE HYPOTHESIS

specifies a precise, non-null state of affairs with

associated risk of error

P(T)

T

Critical value

Sampling distribution if HA were true

Sampling distribution if H0 were true

Rejection of H0 Nonrejection of H0

H0 true

HA true

Type I error at rate

Type II error at rate

Significant result

Nonsignificant result

POWER =(1- )

Power

The probability of rejection of a false null-

hypothesis

depends on - the significance crtierion ()- the sample size (N) - the effect size (NCP)

“The probability of detecting a given effect size in a population from a sample of size N, using significance criterion ”

Impact of alpha

P(T)

T

Critical value

Impact of effect size, N

P(T)

T

Critical value

Applications

POWER SURVEYS / META-ANALYSES- low power undermines the confidence that can be

placed in statistically significant results

INTERPRETING NONSIGIFICANT RESULTS- nonsignficant results only meaningful if power is high

EXPERIMENTAL DESIGN- avoiding false positives vs. dealing with false negatives

MAGNITUDE VS. SIGNIFICANCE- highly significant very important

Practical Exercise 1

Calculation of power for simple case-control

association study.

DATA : allele frequency of “A” allele for cases and

controls

TEST : 2-by-2 contingency table : chi-squared

(1 degree of freedom)

Step 1 : determine expected chi-squared

Hypothetical allele frequencies

Cases P(A) = 0.68

Controls P(A) = 0.54

Sample 150 cases, 150 controls

Excel spreadsheet : faculty drive:\pshaun\chisq.xls

Chi-squared statistic = 12.36

P(T)

T

Critical value

Step 2. Determine the critical value for a given type I error rate,

- inverse central chi-squared distribution

http://workshop.colorado.edu/~pshaun/gpc/pdf.html

df = 1 , NCP = 0

X

0.05

0.01

0.001

3.84146

6.63489

10.82754

P(T)

T

Critical value

Step 3. Determine the power for a given critical valueand non-centrality parameter

- non-central chi-squared distribution

Determining power

df = 1 , NCP = 12.36

X Power

0.05 3.84146

0.01 6.6349

0.001 10.827

0.94

0.83

0.59

Exercises

Using the spreadsheet and the chi-squared

calculator, what is power (for the 3 levels of

alpha)

1. … if the sample size were 300 for each group?

2. … if allele frequencies were 0.24 and 0.18 for

750 cases and 750 controls?

Answers

1. NCP = 24.72 Power

0.05 1.00

0.01 0.99

0.001 0.95

2. NCP = 16.27 Power

0.05 0.98

0.01 0.93

0.001 0.77

nb. Stata : di 1-nchi(df,NCP,invchi(df,))

QTL linkage

POWER

Type I errors

Type II errors

Sample N

Effect Size

Allele frequencies

Genetic valuesVariance explained

Power of tests

For chi-squared tests on large samples, power is

determined by non-centrality parameter () and

degrees of freedom (df)

= E(2lnL1 - 2lnL0)

= E(2lnL1 ) - E(2lnL0)

where expectations are taken at asymptotic values

of maximum likelihood estimates (MLE) under an

assumed true model

Linkage test

HA

H0

SVV

NSDA

ijNV

VVVVDA

42

for i=j

for ij

SDA

NSDAijL VVzV

VVVV

ˆ

for i=j

for ij

xxL 1lnln2

Expected log likelihood under H0

s

xxELE

N

NN

ln

ln)ln2( 10

Expectation of the quadratic product is simply s, the

sibship size

(note: standarised trait)

Expected log likelihood under HA

sP

sE

xxEELE

i

m

ii

L

LL

1

11

ln

ln

ln)ln2(

Linkage test

m

iiiP

10 lnln

For sib-pairs under complete marker information

241

121

041

0 lnlnlnln

Expected NCP

)1ln(

)1ln(4

1)1ln(

2

1)1ln(

4

1

2

22

21

20

S

L

r

rrr

Determinant of 2-by-2 standardised covariance matrix = 1 - r2

Approximation of NCP

),()()()1(

)1(

2

)1(

)()1(

)1(

2

)1(

2222

2

22

2

zCovVVzVarVVarVr

rss

rVarr

rssNCP

DADA

NCP per sib pair is proportional to

- the # of pairs in the sibship(large sibships are powerful)

- the square of the additive QTL variance(decreases rapidly for QTL of v. small effect)

- the sibling correlation(structure of residual variance is important)

QTL linkage

POWER

Type I errors

Type II errors

Sample N

Effect Size

Allele frequencies


Marker vs functional variant

Recombinationfraction

Incomplete linkage

The previous calculations assumed analysis was

performed at the QTL.

- imagine that the test locus is not the QTL

but is linked to it.

Calculate sib-pair IBD distribution at the QTL,

conditional on IBD at test locus,

- a function of recombination fraction

22 )1(

2 )1(2 2)1(

2)1(2 2)1(

)1( )1( )1(21

0

1/2

1

0 1/2 1

at QTL

at M

Use conditional probabilities to calculate the sib

correlation conditional on IBD sharing at the test

marker. For example : for IBD 0 at marker :

0 1/2 1

at QTL

2 )1(2 2)1( P(M=0 | QTL)

r VS VA / 2 + VS VA + VD + VS

C0 = VS2 VA / 2 + VS)1(2 +

VA + VD + VS

2)1( +

The noncentrality parameter per sib pair is then

given by

)1ln(

)1ln(4

1)1ln(

2

1)1ln(

4

1

2

22

21

20

S

L

r

ccc

If the QTL is additive, then

attenuation of the NCP is by a factor of (1-2)4

= square of the correlation

between the proportions of alleles IBD

at two loci with recombination fraction

Effect of incomplete linkage

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

Recombination fraction

Att

en

ua

tio

n i

n N

CP

Effect of incomplete linkage

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100

Map distance cM

Att

en

ua

tio

n i

n N

CP

Comparison to H-E

Amos & Elston (1989) H-E regression

- 90% power (at significant level 0.05)

- QTL variance 0.5

- marker and major gene are completely linked

320 sib pairs

778 sib pairs if = 0.1

GPC input parameters

Proportions of variance

additive QTL variance

dominance QTL variance

residual variance (shared / nonshared)

Recombination fraction ( 0 - 0.5 )

Sample size & Sibship size ( 2 - 5 )

Type I error rate

Type II error rate

GPC output parameters

Expected sibling correlations

- by IBD status at the QTL

- by IBD status at the marker

Expected NCP per sibship

Power

- at different levels of alpha given sample size

Sample size

- for specified power at different levels of alpha given power

From GPC

Modelling additive effects only

Sibships Individuals

Pairs 265 (320) 530

Pairs ( = 0.1) 666 (778) 1332

Trios ( = 0.1) 220 660

Quads ( = 0.1) 110 440

Quints ( = 0.1) 67 335

Practical Exercise 2

What is the effect on power to detect linkage of :

1. QTL variance?

2. residual sibling correlation?

3. marker-QTL recombination fraction?

Pairs required (=0, p=0.05, power=0.8)

0

50000

100000

150000

200000

250000

300000

0 0.1 0.2 0.3 0.4 0.5

QTL variance

N

Pairs required (=0, p=0.05, power=0.8)

1

10

100

1000

10000

100000

1000000

0 0.1 0.2 0.3 0.4 0.5

QTL variance

log

N

Effect of residual correlation

QTL additive effects account for 10% trait variance

Sample size required for 80% power (=0.05)

No dominance

= 0.1

A residual correlation 0.35

B residual correlation 0.50

C residual correlation 0.65

Individuals required

0

5000

10000

15000

20000

25000

Pairs Trios Quads Quints

r=0.35

r=0.5

r=0.65

Selective genotyping

ASP Extreme Discordant

Maximally DissimilarProband Selection EDAC

EDAC

Unselected

Mahanalobis Distance

Selective genotyping

The power calculations so far assume an

unselected population.

- calculate expected NCP per sibship

If we have a sample with trait scores

- calculate expected NCP for each sibship

conditional on trait values

- this quantity can be used to rank order

the sample for genotying

Sibship informativeness : sib pairs

-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4

-3-2

-10

12

34

Sib 2 trait

00.20.40.60.8

11.21.41.6

Sibship NCP

Sibship informativeness : sib pairs

-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4

-3-2

-10

12

34

Sib 2 trait

0

0.5

1

1.5

2

Sibship NCP

-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4

-3-2

-10

12

34

Sib 2 trait

0

0.5

1

1.5

2

Sibship NCP

-4-3

-2-1

01

23

4Sib 1 trait

-4-3

-2-1

01

23

4

Sib 2 trait

0

0.5

1

1.5

2

Sibship NCP

unequal allelefrequencies

dominance

rarerecessive

Selective genotypingASP MaxDPS ED EDAC MDis SEL BSEL T

p d/a

.5 0

.1 0

.25 0

.1 1

.25 1

.5 1

.75 1

.9 1

15.82

17.10

15.45

16.88

15.76

18.89

27.64

43.16

Impact of selection

QTL linkage

POWER

Type I errors

Type II errors

Sample N

Effect Size

Allele frequencies


PICLocus informativeness



Indices of marker informativeness:

Markers should be highly polymorphic

- alleles inherited from different sources are likely

to be distinguishable

Heterozygosity (H)

Polymorphism Information Content (PIC)

- measure number and frequency of alleles at a

locus

Heterozygosity

n = number of alleles,

pi = frequency of the ith allele.

H = probability that an individual is heterozygous

n

iipH

1

21

Heterozygosity

Allele Frequency1 0.202 0.353 0.054 0.40

Genotype Frequency11 0.0412 0.1413 0.0214 0.1622 0.122523 0.03524 0.2833 0.002534 0.0444 0.16

Genotype Frequency1112 0.1413 0.0214 0.162223 0.03524 0.283334 0.0444

H = 0.675 675.0

4.005.0

35.02.01

1

22

22

1

2

n

iipH

Polymorphism information content

IF a parent is heterozygous,

their gametes will usually be informative.

BUT if both parents & child are heterozygous for

the same genotype,

origins of child’s alleles are ambiguous

IF C = the probability of this occurring,

PIC = H - C

Polymorphism information content

n

i

n

ijji

n

ii ppp

CHPIC

1 1

22

1

2 21

Possible IBD configurations given parental genotypes

ConfigurationParental Mating

Type Probabilityz

1 Hom Hom 1/4 1/2 (1-H)2

2 Hom Het 0 1/4 H(1-H)

3 Hom Het 1/2 3/4 H(1-H)

4 Het Het 0 1/2 H2 / 2

5 Het Het 0 0 (H2 -C)/4

6 Het Het 1 1 (H2 -C)/4

7 Het Het 1/2 1/2 C/2

PIC & NCP for linkage

From the table of possible IBD configurations given

parental genotypes,

Therefore, NCP is attenuated in proportion to PIC

PICVarVar )()ˆ(

QTL linkage

POWER

Type I errors

Type II errors

Sample N

Effect Size

Allele frequencies


PICLocus informativeness



Marker density

MPIC

Multipoint

Multipoint IBD

Estimates IBD sharing at any arbitrary point along a

chromosomal region, using all available marker

information on a chromosome simultaneously.

, , and PIC

8)( 1

1

PICVar

8),( 1

11

PICCov

8

)21(),(

2

2121

PICPICCov

8

)21(),(

2

11

PICCov t

^

5cM 5cM 5cM 5cM

M10.10.20.7

M20.20.20.20.20.2

M30.1 0.10.1 0.10.1 0.20.1 0.2

M40.20.10.10.20.20.2

M50.20.20.20.20.2

1. Calculate PIC for each marker

PIC 0.41 0.77 0.84 0.79 0.77

5 cM --> = 0.04758

2. Convert map distances into recombination fractions

Haldane map function (m = map distance in Morgans)

2

1 2me

3. Calculate covariance matrixbetween pi-hat at markers

8

)21(),(

2

jiji PICPICCov

M1 M2 M3 M4 M5M1 0.051 0.032 0.035 0.033 0.032M2 0.032 0.096 0.066 0.062 0.061M3 0.035 0.066 0.105 0.068 0.066M4 0.033 0.062 0.068 0.099 0.062M5 0.032 0.061 0.066 0.062 0.096

MM =

8)( 1

1

PICVar

At each position along the chromosome, calculate

covariance between trait locus and each of the markers

MD1015202530

RF0.0910.1300.1650.1970.226

PIC0.410.770.840.790.77

5cM 5cM 5cM 5cM

M1 M2 M3 M4 M5

10cM15cM

20cM25cM

30cM

8)21(),( 2 i

ti

PICCov

0.03440.05280.04720.03630.0290

MT =

4. Consider each multipoint position

Fulker et al multipoint

If is a vector of single marker IBD estimates then

a multipoint IBD estimate at test position t is

given by :

Conditional on the variance of at the test

position is reduced by a quantity which can be

thought of as a multipoint PIC

πˆ 1 MMMTt

π

π

5. Calculate MPIC

0

20

40

60

80

100

-10 0 10 20 30

Map Distance (cM)

MP

IC (

%)

MTMMMTMPIC 1

10 cM map

20

30

40

50

60

70

80

90

100

-10 10 30 50 70 90 110

Position of Trait Locus (cM)

Mu

ltip

oin

t P

IC (

%)

10 cM

5 cM map

20

30

40

50

60

70

80

90

100

-10 10 30 50 70 90 110

Position of Trait Locus (cM)

Mu

ltip

oin

t P

IC (

%)

5 cM

10 cM

Exclusion mapping

Exclusion : support for the hypothesis that a QTL

of at least a certain effect is absent at that

position

Normally, the LRT compares the likelihood at the

MLE and the null

In exclusion mapping, the LRT compares the

likelihood of a fixed effect size against the null

and therefore can be negative

Conclusions

Factors influencing power

QTL variance

Sib correlation

Sibship size

Marker informativeness

Marker density

Phenotypic selection

power in qtl linkage analysis

Documents