continuous heterogeneity
DESCRIPTION
Continuous heterogeneity. Shaun Purcell Boulder Twin Workshop March 2004. MZ 1.03 0.870.98 DZ 0.95 0.571.08. Raw data VS summary statistics. ZygT1 T2 11.20.8 1-1.3-2.2 20.71.9 20.2-0.8 ...... Raw data VS summary statistics. ZygT1 T2 11.20.8 1-1.3-2.2 - PowerPoint PPT PresentationTRANSCRIPT
Continuous heterogeneity
Shaun Purcell
Boulder Twin WorkshopMarch 2004
Raw data VS summary statistics
Zyg T1 T2
1 1.2 0.8
1 -1.3 -2.2
2 0.7 1.9
2 0.2 -0.8
.. ... ...
MZ
1.03
0.87 0.98
DZ
0.95
0.57 1.08
Raw data VS summary statistics
Zyg T1 T2
1 1.2 0.8
1 -1.3 -2.2
2 0.7 1.9
2 0.2 -0.8
.. ... ...
Raw data VS summary statistics
Zyg T1 T2 age
1 1.2 0.8 12.3
1 -1.3 -2.2 10.3
2 0.7 1.9 8.7
2 0.2 -0.8 14.5
.. ... ... ...
Variance
Bivariate normal distribution
-3 -2 -1 0 1 2 3 -3-2
-10
12
3
0.30.40.5
-3 -2 -1 0 1 2 3 -3-2
-10
12
3
0.30.40.5
Data Mean
Introducing Definition variables
• Zygosity as a definition variable
• “Rectangular” file data.raw
1 1 0.361769 -0.356412 1 0.888986 1.463423 1 0.535161 0.636073...1 2 0.234099 0.08483182 2 -0.547252 -0.229763 2 -0.307926 -0.253692...
!Using definition variablesGroup1: Defines MatricesCalc NGroups=2
Begin Matrices;X Lower 1 1 freeY Lower 1 1 freeZ Lower 1 1 freeM full 1 1 freeH Full 1 1
End Matrices;Begin Algebra;
A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra;Ma X 0Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt
End
Group2: MZ & DZ twin pairsData NInput_vars=4 NObservations=0RE file=data.rawLabels id zyg t1 t2Select t1 t2 zyg /Definition zyg /Matrices = Group 1Means M | M /Covariances A + C + E | (H~)@A + C _ (H~)@A + C | A + C + E /Specify H -1End
H will be specified as a definition variable
M, necessary for the means model
Optional: request individualfit statistics for each pair
A single group for both MZ & DZ twins
Points to a “REctangular” data file
No need to specify number of pairs
Zygosity is a “Definition” variable
Multiply A component by 1/H
1 x 1 matrix H represents each pair’s zygosity
A model for the means [ twin1 | twin 2]
Output from zyg.mxRE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations
Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code -1.0000 1.0000 2.0000 Number 500.0000 500.0000 500.0000 Mean 1.5000 -0.0140 0.0240 Variance 0.2500 0.5601 0.5211 Minimum 1.0000 -2.1941 -1.9823 Maximum 2.0000 2.1218 2.7670
Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by 1 1 1 -1 MATRIX M This is a FULL matrix of order 1 by 1 1 1 4 MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 1 MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 2 MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 3
Specify H -1
Output from zyg.mx
Your model has 4 estimated parameters and
1000 Observed statistics
-2 times log-likelihood of data >>> 2134.998
Degrees of freedom >>>>>>>>>>>>>>>> 996
• Fixing X to zero
Your model has 3 estimated parameters and 1000 Observed statistics
-2 times log-likelihood of data >>> 2154.626
Degrees of freedom >>>>>>>>>>>>>>>> 997
Continuous moderators
• Traits often best defined continuously
• Many environmental moderators also likely to be continuous in nature– Age
– Gestational age
– Socio-economic status
– Educational level
– Consumption of food / alcohol / drugs
• How to test for G x E interaction in this case?
Continuous moderators
• Problems?– Stratification of sample reduced sample size
– Modelling proportions of variance• implicitly assumes equality of variance w.r.t moderator
– Logical to assume a linear G E interaction• linearity at the level of effect, not variance
– No obvious statistical test for heterogeneity
Heritability
4 6 8 10Age (yrs)0%
100%
Biometrical G E model
• At a hypothetical single locus– additive genetic value a– allele frequency p– QTL variance 2p(1-p)a2
• Assuming a linear interaction– additive genetic value a + M– allele frequency p– QTL variance 2p(1-p)(a +
M)2
Biometrical G E model
M
No interaction
AaAA aa
a
0
-a
M
Interaction
1
-1
M
Equivalently…
21
1
Model-fitting approach to GxE
Twin 1
A C E
Twin 2
A C E
a c e c ea
Model-fitting approach to GxE
Twin 1
A C E
Twin 2
A C E
a+XM c e c ea+XM
Continuous moderator variable MCan be coded 0 / 1 in the dichotomous case
Individual specific moderators
Twin 1
A C E
Twin 2
A C E
a+XM1 c e c ea+XM2
E x E interactions
Twin 1
A C E
Twin 2
A C E
a+XM1
c+YM1
e+ZM1a+XM2
c+YM2
e+ZM2
ACE - XYZ - M
Twin 1
A C E
Twin 2
A C E
a+XM1
c+YM1
e+ZM1a+XM2
c+YM2
e+ZM2
M
m+MM1
M
m+MM2
Main effects and moderating effects statistically and conceptually distinct
Model-fitting approach to GxE
C
A
ECom
pone
nt o
f var
ianc
e
Moderator variable
Turkheimer et al (2003)
• 320 twin pairs recruited at birth from urban hospitals
• G : additive genetic variance
• E : SES – parental education, occupation, income
• X : IQ– Wechsler; Verbal, Performance, Full
A C EF
ull s
cale
IQ
Ver
bal
IQN
on-V
erba
lIQ
Standard model
• Means vector
• Covariance matrix
mm
22222
222
ecacZa
eca
Allowing for a main effect of X
• Means vector
• Covariance matrix
ii XmXm 21
22222
222
ecacZa
eca
! Basic model + main effect of a definition variableG1: Define MatricesData Calc NGroups=3Begin Matrices;A full 1 1 freeC full 1 1 freeE full 1 1 freeM full 1 1 free ! grand meanB full 1 1 free ! moderator-linked means modelH full 1 1R full 1 1 ! twin 1 moderator (definition variable)S full 1 1 ! twin 2 moderator (definition variable)End Matrices;Ma M 0Ma B 0Ma A 1Ma C 1Ma E 1Matrix H .5Options NO_OutputEnd
G2: MZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 1 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _A*A' + C*C' | A*A' + C*C' + E*E' /
!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2Options NO_OutputEnd
G3: DZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 2 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | H@A*A' + C*C' _H@A*A' + C*C' | A*A' + C*C' + E*E' /
!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2
End
MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3228 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.3381 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1051 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9728 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1035 Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3123.925 Degrees of freedom >>>>>>>>>>>>>>>> 795
MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3078 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.0000 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1733 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9749 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1069 Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3138.157 Degrees of freedom >>>>>>>>>>>>>>>> 796
Continuous heterogeneity model
• Means vector
• Covariance matrix
ii XmXm 21
22
22
222121
21
21
21
)()()())(())((
)()()(
iZiYiXiYiYiXiX
iZiYiX
XeXcXaXcXcXaXaZ
XeXcXa
! GxE - Basic modelG1: Define MatricesData Calc NGroups=3Begin Matrices;A full 1 1 freeC full 1 1 freeE full 1 1 freeT full 1 1 free ! moderator-linked A componentU full 1 1 free ! moderator-linked C componentV full 1 1 free ! moderator-linked E componentM full 1 1 free ! grand meanB full 1 1 free ! moderator-linked means modelH full 1 1R full 1 1 ! twin 1 moderator (definition variable)S full 1 1 ! twin 2 moderator (definition variable)End Matrices;Ma T 0Ma U 0Ma V 0Ma M 0Ma B 0Ma A 1Ma C 1Ma E 1Matrix H .5Options NO_OutputEnd
G2: MZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 1 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) /
!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2Options NO_OutputEnd
G3: DZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 2 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | H@(A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _H@(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) /
!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2
End
Practical 1
• The script: mod.mx• The data: f1.datID zygosity trait_twin_1 trait_twin_2 mod_twin_1 mod_twin_2
1. Any evidence for G × E for this trait ?• i.e. does the A latent variable show heterogeneity with
respect to the moderator variable
2. If so, in what way?• i.e. how would you interpret/describe the effect?
Practical 1 : f1.dat
Fra
ctio
n
v5-2.43686 2.45835
0
.095
Moderator distribution
v3
v4-4.68825 5.3401
-4.59727
5.46955
v3
v4-4.42047 5.43616
-5.59837
4.77361
MZ pairs (trait)
DZ pairs (trait)
v3
v5-2.43686 2.45835
-5.59837
5.46955
All twin 1’s v.s. moderator
nomod.mx
a 1.3078 a2 ~ 1.7
c 1.1733 c2 ~ 1.4
e 0.9749 e2 ~ 0.95
a2+c2+e2 = 4.05
i.e. % variance is 42%, 35% and 23%
Parameter estimates: mod.mxACE-XYZ-M ACE-YZ-M
A 1.2288 1.4455
C 0.9874 0.6837
E 0.9236 0.9484
T -0.6007
U 0.1763 -0.6817
V 0.3825 0.4663
M 0.0737 0.0724
B 0.367 0.3625
Plotting VCs
• For the additive genetic VC, for example– Given a, and a range of values for the
moderator variable
• For example, a = 0.5, = -0.2 and M ranges from -2 to +2
M (a+M)2 (a+M)2
-2 (0.5+(-0.2×-2))2 0.81
-1.5 (0.5+(-0.2×-1.5))2 0.73
…
+2 (0.5+(-0.2×2))2 0.01 -0.1
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
M
VC
0
1
2
3
4
5
6
7
8
9
10
-4 -3 -2 -1 0 1 2 3 4
Moderator
Va
ria
nc
e C
om
po
ne
nts
A
C
E
Specific test of G×E
-2LL Df
Full model
ACE-XYZ-M3024.689 792
Sub model
ACE-YZ-M3034.898 793
Difference 10.209 1
p-value = 0.00139
Other tests
Test Submodel -2LL Δdf p-value
Y ACE-XZ-M 3025.782 1 0.29
Z ACE-XY-M 3110.429 1 < 1×10-19
M ACE-XYZ 3039.370 1 0.00013
C & Y AE-XZ-M 3026.228 2 0.46
All made against the full model
ACE-XYZ-M, -2LL = 3024.689
Confidence intervals
• Easy to get CIs for individual parameters
• Additionally, CIs on the moderated VCs are useful for interpretation
• e.g. a 95% CI for (a+M)2, for a specific M
• Define two extra vectors in Group 1
P full 1 13 O Unit 1 13 Matrix P -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
• Add a 4th group to calculate the CIs
CIsCalc Matrices = Group 1Begin Algebra; F= ( A@O + T@P ) . ( A@O + T@P ) / G= ( C@O + U@P ) . ( C@O + U@P ) / I= ( E@O + V@P ) . ( E@O + V@P ) /End Algebra;Interval @ 95 F 1 1 to F 1 13Interval @ 95 G 1 1 to G 1 13 Interval @ 95 I 1 1 to I 1 13 End;
Calculation of CIs
F= ( A@O + T@P ) . ( A@O + T@P ) /
• E.g. if P were 210
210111 xathen ( A@O + T@P ) equals
xaxaa 2
222 2xaxaa
or
Finally, the dot-product squares all elements to give
xxaaa 20or
Confidence intervals on VCs
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
A C E
Other considerations
• Simple approach to test for heterogeneity – easily adapted, e.g. for ordinal data models
• Extensions / things to watch for…– scalar v.s. qualitative heterogeneity
• v. low power
– the environment may show shared genetic influence with the trait
– nonlinear effects in both mediation and moderation
X
E
G
Main effect
Moderating G E
rGE
Turkheimer et al, 2003
SES
IQ
SES
V(IQ)
Simulated twin data
Moderator Standard
Qua
drat
ic
E(T
rait)
A 3 df test of any moderating effect Standard analysis : linear means model (in HA and H0) Quadratic analysis : linear and quadratic means model (in HA and H0)
18/50 replicates significanti.e. type I error 36% for nominal 5% level
More complex G E interaction
E-risk
Trait P(disease)
Include E-risk in means model
E-risk
Residual Trait P(disease | E-risk)
Biometrical model
E-risk
Additive genetic effect
Quadratic form
AaAA aa
ACE - XYZ - X2Y2Z2 - M
Twin 1
A C E
Twin 2
A C E
a +XM1 +XM2
1
c e c ea+XM2
+XM22