power in qtl linkage analysis
DESCRIPTION
Power in QTL linkage analysis. Shaun Purcell & Pak Sham SGDP, IoP, London, UK. F:\pshaun\power.ppt. YES. NO. Test statistic. Power primer. Statistics (e.g. chi-squared, z-score) are continuous measures of support for a certain hypothesis. YES OR NO decision-making : significance testing. - PowerPoint PPT PresentationTRANSCRIPT
Power in QTL linkage analysis
Shaun Purcell & Pak Sham
SGDP, IoP, London, UK
F:\pshaun\power.ppt
Power primer
Statistics (e.g. chi-squared, z-score) are continuous
measures of support for a certain hypothesis
Test statistic
YES OR NO decision-making : significance testingInevitably leads to two types of mistake :
false positive (YES instead of NO) (Type I)false negative (NO instead of YES) (Type II)
YESNO
Hypothesis testing
Null hypothesis : no effect
A ‘significant’ result means that we can reject the
null hypothesis
A ‘nonsignificant’ result means that we cannot
reject the null hypothesis
Statistical significance
The ‘p-value’
The probability of a false positive error if the null
were in fact true
Typically, we are willing to incorrectly reject the null
5% or 1% of the time (Type I error)
Misunderstandings
p - VALUES
that the p value is the probability of the null
hypothesis being true
that high p values mean large and important
effects
NULL HYPOTHESIS
that nonrejection of the null implies its truth
Limitations
IF A RESULT IS SIGNIFICANT
leads to the conclusion that the null is false
BUT, this may be trivial
IF A RESULT IS NONSIGNIFICANT
leads only to the conclusion that it cannot be
concluded that the null is false
Alternate hypothesis
Neyman & Pearson (1928)
ALTERNATE HYPOTHESIS
specifies a precise, non-null state of affairs with
associated risk of error
P(T)
T
Critical value
Sampling distribution if HA were true
Sampling distribution if H0 were true
Rejection of H0 Nonrejection of H0
H0 true
HA true
Type I error at rate
Type II error at rate
Significant result
Nonsignificant result
POWER =(1- )
Power
The probability of rejection of a false null-
hypothesis
depends on - the significance crtierion ()- the sample size (N) - the effect size (NCP)
“The probability of detecting a given effect size in a population from a sample of size N, using significance criterion ”
Impact of alpha
P(T)
T
Critical value
Impact of effect size, N
P(T)
T
Critical value
Applications
POWER SURVEYS / META-ANALYSES- low power undermines the confidence that can be
placed in statistically significant results
INTERPRETING NONSIGIFICANT RESULTS- nonsignficant results only meaningful if power is high
EXPERIMENTAL DESIGN- avoiding false positives vs. dealing with false negatives
MAGNITUDE VS. SIGNIFICANCE- highly significant very important
Practical Exercise 1
Calculation of power for simple case-control
association study.
DATA : allele frequency of “A” allele for cases and
controls
TEST : 2-by-2 contingency table : chi-squared
(1 degree of freedom)
Step 1 : determine expected chi-squared
Hypothetical allele frequencies
Cases P(A) = 0.68
Controls P(A) = 0.54
Sample 150 cases, 150 controls
Excel spreadsheet : faculty drive:\pshaun\chisq.xls
Chi-squared statistic = 12.36
P(T)
T
Critical value
Step 2. Determine the critical value for a given type I error rate,
- inverse central chi-squared distribution
http://workshop.colorado.edu/~pshaun/gpc/pdf.html
df = 1 , NCP = 0
X
0.05
0.01
0.001
3.84146
6.63489
10.82754
P(T)
T
Critical value
Step 3. Determine the power for a given critical valueand non-centrality parameter
- non-central chi-squared distribution
Determining power
df = 1 , NCP = 12.36
X Power
0.05 3.84146
0.01 6.6349
0.001 10.827
0.94
0.83
0.59
Exercises
Using the spreadsheet and the chi-squared
calculator, what is power (for the 3 levels of
alpha)
1. … if the sample size were 300 for each group?
2. … if allele frequencies were 0.24 and 0.18 for
750 cases and 750 controls?
Answers
1. NCP = 24.72 Power
0.05 1.00
0.01 0.99
0.001 0.95
2. NCP = 16.27 Power
0.05 0.98
0.01 0.93
0.001 0.77
nb. Stata : di 1-nchi(df,NCP,invchi(df,))
QTL linkage
POWER
Type I errors
Type II errors
Sample N
Effect Size
Allele frequencies
Genetic valuesVariance explained
Power of tests
For chi-squared tests on large samples, power is
determined by non-centrality parameter () and
degrees of freedom (df)
= E(2lnL1 - 2lnL0)
= E(2lnL1 ) - E(2lnL0)
where expectations are taken at asymptotic values
of maximum likelihood estimates (MLE) under an
assumed true model
Linkage test
HA
H0
SVV
NSDA
ijNV
VVVVDA
42
for i=j
for ij
SDA
NSDAijL VVzV
VVVV
ˆ
for i=j
for ij
xxL 1lnln2
Expected log likelihood under H0
s
xxELE
N
NN
ln
ln)ln2( 10
Expectation of the quadratic product is simply s, the
sibship size
(note: standarised trait)
Expected log likelihood under HA
sP
sE
xxEELE
i
m
ii
L
LL
1
11
ln
ln
ln)ln2(
Linkage test
m
iiiP
10 lnln
For sib-pairs under complete marker information
241
121
041
0 lnlnlnln
Expected NCP
)1ln(
)1ln(4
1)1ln(
2
1)1ln(
4
1
2
22
21
20
S
L
r
rrr
Determinant of 2-by-2 standardised covariance matrix = 1 - r2
Approximation of NCP
),()()()1(
)1(
2
)1(
)()1(
)1(
2
)1(
2222
2
22
2
zCovVVzVarVVarVr
rss
rVarr
rssNCP
DADA
NCP per sib pair is proportional to
- the # of pairs in the sibship(large sibships are powerful)
- the square of the additive QTL variance(decreases rapidly for QTL of v. small effect)
- the sibling correlation(structure of residual variance is important)
QTL linkage
POWER
Type I errors
Type II errors
Sample N
Effect Size
Allele frequencies
Genetic valuesVariance explained
Marker vs functional variant
Recombinationfraction
Incomplete linkage
The previous calculations assumed analysis was
performed at the QTL.
- imagine that the test locus is not the QTL
but is linked to it.
Calculate sib-pair IBD distribution at the QTL,
conditional on IBD at test locus,
- a function of recombination fraction
22 )1(
2 )1(2 2)1(
2)1(2 2)1(
)1( )1( )1(21
0
1/2
1
0 1/2 1
at QTL
at M
Use conditional probabilities to calculate the sib
correlation conditional on IBD sharing at the test
marker. For example : for IBD 0 at marker :
0 1/2 1
at QTL
2 )1(2 2)1( P(M=0 | QTL)
r VS VA / 2 + VS VA + VD + VS
C0 = VS2 VA / 2 + VS)1(2 +
VA + VD + VS
2)1( +
The noncentrality parameter per sib pair is then
given by
)1ln(
)1ln(4
1)1ln(
2
1)1ln(
4
1
2
22
21
20
S
L
r
ccc
If the QTL is additive, then
attenuation of the NCP is by a factor of (1-2)4
= square of the correlation
between the proportions of alleles IBD
at two loci with recombination fraction
Effect of incomplete linkage
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5
Recombination fraction
Att
en
ua
tio
n i
n N
CP
Effect of incomplete linkage
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100
Map distance cM
Att
en
ua
tio
n i
n N
CP
Comparison to H-E
Amos & Elston (1989) H-E regression
- 90% power (at significant level 0.05)
- QTL variance 0.5
- marker and major gene are completely linked
320 sib pairs
778 sib pairs if = 0.1
GPC input parameters
Proportions of variance
additive QTL variance
dominance QTL variance
residual variance (shared / nonshared)
Recombination fraction ( 0 - 0.5 )
Sample size & Sibship size ( 2 - 5 )
Type I error rate
Type II error rate
GPC output parameters
Expected sibling correlations
- by IBD status at the QTL
- by IBD status at the marker
Expected NCP per sibship
Power
- at different levels of alpha given sample size
Sample size
- for specified power at different levels of alpha given power
From GPC
Modelling additive effects only
Sibships Individuals
Pairs 265 (320) 530
Pairs ( = 0.1) 666 (778) 1332
Trios ( = 0.1) 220 660
Quads ( = 0.1) 110 440
Quints ( = 0.1) 67 335
Practical Exercise 2
What is the effect on power to detect linkage of :
1. QTL variance?
2. residual sibling correlation?
3. marker-QTL recombination fraction?
Pairs required (=0, p=0.05, power=0.8)
0
50000
100000
150000
200000
250000
300000
0 0.1 0.2 0.3 0.4 0.5
QTL variance
N
Pairs required (=0, p=0.05, power=0.8)
1
10
100
1000
10000
100000
1000000
0 0.1 0.2 0.3 0.4 0.5
QTL variance
log
N
Effect of residual correlation
QTL additive effects account for 10% trait variance
Sample size required for 80% power (=0.05)
No dominance
= 0.1
A residual correlation 0.35
B residual correlation 0.50
C residual correlation 0.65
Individuals required
0
5000
10000
15000
20000
25000
Pairs Trios Quads Quints
r=0.35
r=0.5
r=0.65
Selective genotyping
ASP Extreme Discordant
Maximally DissimilarProband Selection EDAC
EDAC
Unselected
Mahanalobis Distance
Selective genotyping
The power calculations so far assume an
unselected population.
- calculate expected NCP per sibship
If we have a sample with trait scores
- calculate expected NCP for each sibship
conditional on trait values
- this quantity can be used to rank order
the sample for genotying
Sibship informativeness : sib pairs
-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4
-3-2
-10
12
34
Sib 2 trait
00.20.40.60.8
11.21.41.6
Sibship NCP
Sibship informativeness : sib pairs
-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4
-3-2
-10
12
34
Sib 2 trait
0
0.5
1
1.5
2
Sibship NCP
-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4
-3-2
-10
12
34
Sib 2 trait
0
0.5
1
1.5
2
Sibship NCP
-4-3
-2-1
01
23
4Sib 1 trait
-4-3
-2-1
01
23
4
Sib 2 trait
0
0.5
1
1.5
2
Sibship NCP
unequal allelefrequencies
dominance
rarerecessive
Selective genotypingASP MaxDPS ED EDAC MDis SEL BSEL T
p d/a
.5 0
.1 0
.25 0
.1 1
.25 1
.5 1
.75 1
.9 1
15.82
17.10
15.45
16.88
15.76
18.89
27.64
43.16
Impact of selection
QTL linkage
POWER
Type I errors
Type II errors
Sample N
Effect Size
Allele frequencies
Genetic valuesVariance explained
PICLocus informativeness
Marker vs functional variant
Recombinationfraction
Indices of marker informativeness:
Markers should be highly polymorphic
- alleles inherited from different sources are likely
to be distinguishable
Heterozygosity (H)
Polymorphism Information Content (PIC)
- measure number and frequency of alleles at a
locus
Heterozygosity
n = number of alleles,
pi = frequency of the ith allele.
H = probability that an individual is heterozygous
n
iipH
1
21
Heterozygosity
Allele Frequency1 0.202 0.353 0.054 0.40
Genotype Frequency11 0.0412 0.1413 0.0214 0.1622 0.122523 0.03524 0.2833 0.002534 0.0444 0.16
Genotype Frequency1112 0.1413 0.0214 0.162223 0.03524 0.283334 0.0444
H = 0.675 675.0
4.005.0
35.02.01
1
22
22
1
2
n
iipH
Polymorphism information content
IF a parent is heterozygous,
their gametes will usually be informative.
BUT if both parents & child are heterozygous for
the same genotype,
origins of child’s alleles are ambiguous
IF C = the probability of this occurring,
PIC = H - C
Polymorphism information content
n
i
n
ijji
n
ii ppp
CHPIC
1 1
22
1
2 21
Possible IBD configurations given parental genotypes
ConfigurationParental Mating
Type Probabilityz
1 Hom Hom 1/4 1/2 (1-H)2
2 Hom Het 0 1/4 H(1-H)
3 Hom Het 1/2 3/4 H(1-H)
4 Het Het 0 1/2 H2 / 2
5 Het Het 0 0 (H2 -C)/4
6 Het Het 1 1 (H2 -C)/4
7 Het Het 1/2 1/2 C/2
PIC & NCP for linkage
From the table of possible IBD configurations given
parental genotypes,
Therefore, NCP is attenuated in proportion to PIC
PICVarVar )()ˆ(
QTL linkage
POWER
Type I errors
Type II errors
Sample N
Effect Size
Allele frequencies
Genetic valuesVariance explained
PICLocus informativeness
Recombinationfraction
Marker vs functional variant
Marker density
MPIC
Multipoint
Multipoint IBD
Estimates IBD sharing at any arbitrary point along a
chromosomal region, using all available marker
information on a chromosome simultaneously.
, , and PIC
8)( 1
1
PICVar
8),( 1
11
PICCov
8
)21(),(
2
2121
PICPICCov
8
)21(),(
2
11
PICCov t
^
5cM 5cM 5cM 5cM
M10.10.20.7
M20.20.20.20.20.2
M30.1 0.10.1 0.10.1 0.20.1 0.2
M40.20.10.10.20.20.2
M50.20.20.20.20.2
1. Calculate PIC for each marker
PIC 0.41 0.77 0.84 0.79 0.77
5 cM --> = 0.04758
2. Convert map distances into recombination fractions
Haldane map function (m = map distance in Morgans)
2
1 2me
3. Calculate covariance matrixbetween pi-hat at markers
8
)21(),(
2
jiji PICPICCov
M1 M2 M3 M4 M5M1 0.051 0.032 0.035 0.033 0.032M2 0.032 0.096 0.066 0.062 0.061M3 0.035 0.066 0.105 0.068 0.066M4 0.033 0.062 0.068 0.099 0.062M5 0.032 0.061 0.066 0.062 0.096
MM =
8)( 1
1
PICVar
At each position along the chromosome, calculate
covariance between trait locus and each of the markers
MD1015202530
RF0.0910.1300.1650.1970.226
PIC0.410.770.840.790.77
5cM 5cM 5cM 5cM
M1 M2 M3 M4 M5
10cM15cM
20cM25cM
30cM
8)21(),( 2 i
ti
PICCov
0.03440.05280.04720.03630.0290
MT =
4. Consider each multipoint position
Fulker et al multipoint
If is a vector of single marker IBD estimates then
a multipoint IBD estimate at test position t is
given by :
Conditional on the variance of at the test
position is reduced by a quantity which can be
thought of as a multipoint PIC
πˆ 1 MMMTt
π
π
5. Calculate MPIC
0
20
40
60
80
100
-10 0 10 20 30
Map Distance (cM)
MP
IC (
%)
MTMMMTMPIC 1
10 cM map
20
30
40
50
60
70
80
90
100
-10 10 30 50 70 90 110
Position of Trait Locus (cM)
Mu
ltip
oin
t P
IC (
%)
10 cM
5 cM map
20
30
40
50
60
70
80
90
100
-10 10 30 50 70 90 110
Position of Trait Locus (cM)
Mu
ltip
oin
t P
IC (
%)
5 cM
10 cM
Exclusion mapping
Exclusion : support for the hypothesis that a QTL
of at least a certain effect is absent at that
position
Normally, the LRT compares the likelihood at the
MLE and the null
In exclusion mapping, the LRT compares the
likelihood of a fixed effect size against the null
and therefore can be negative
Conclusions
Factors influencing power
QTL variance
Sib correlation
Sibship size
Marker informativeness
Marker density
Phenotypic selection