continuous heterogeneity

Continuous heterogeneity

Shaun Purcell

Boulder Twin WorkshopMarch 2004

Raw data VS summary statistics

Zyg T1 T2

1 1.2 0.8

1 -1.3 -2.2

2 0.7 1.9

2 0.2 -0.8

.. ... ...

MZ

1.03

0.87 0.98

DZ

0.95

0.57 1.08


Zyg T1 T2

1 1.2 0.8

1 -1.3 -2.2

2 0.7 1.9

2 0.2 -0.8

.. ... ...


Zyg T1 T2 age

1 1.2 0.8 12.3

1 -1.3 -2.2 10.3

2 0.7 1.9 8.7

2 0.2 -0.8 14.5

.. ... ... ...

Variance

Bivariate normal distribution

-3 -2 -1 0 1 2 3 -3-2

-10

12

3

0.30.40.5

-3 -2 -1 0 1 2 3 -3-2

-10

12

3

0.30.40.5

Data Mean

Introducing Definition variables

• Zygosity as a definition variable

• “Rectangular” file data.raw

1 1 0.361769 -0.356412 1 0.888986 1.463423 1 0.535161 0.636073...1 2 0.234099 0.08483182 2 -0.547252 -0.229763 2 -0.307926 -0.253692...

!Using definition variablesGroup1: Defines MatricesCalc NGroups=2

Begin Matrices;X Lower 1 1 freeY Lower 1 1 freeZ Lower 1 1 freeM full 1 1 freeH Full 1 1

End Matrices;Begin Algebra;

A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra;Ma X 0Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt

End

Group2: MZ & DZ twin pairsData NInput_vars=4 NObservations=0RE file=data.rawLabels id zyg t1 t2Select t1 t2 zyg /Definition zyg /Matrices = Group 1Means M | M /Covariances A + C + E | (H~)@A + C _ (H~)@A + C | A + C + E /Specify H -1End

H will be specified as a definition variable

M, necessary for the means model

Optional: request individualfit statistics for each pair

A single group for both MZ & DZ twins

Points to a “REctangular” data file

No need to specify number of pairs

Zygosity is a “Definition” variable

Multiply A component by 1/H

1 x 1 matrix H represents each pair’s zygosity

A model for the means [ twin1 | twin 2]

Output from zyg.mxRE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations

Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code -1.0000 1.0000 2.0000 Number 500.0000 500.0000 500.0000 Mean 1.5000 -0.0140 0.0240 Variance 0.2500 0.5601 0.5211 Minimum 1.0000 -2.1941 -1.9823 Maximum 2.0000 2.1218 2.7670

Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by 1 1 1 -1 MATRIX M This is a FULL matrix of order 1 by 1 1 1 4 MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 1 MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 2 MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 3

Specify H -1

Output from zyg.mx

Your model has 4 estimated parameters and

1000 Observed statistics

-2 times log-likelihood of data >>> 2134.998

Degrees of freedom >>>>>>>>>>>>>>>> 996

• Fixing X to zero

Your model has 3 estimated parameters and 1000 Observed statistics

-2 times log-likelihood of data >>> 2154.626

Degrees of freedom >>>>>>>>>>>>>>>> 997

Continuous moderators

• Traits often best defined continuously

• Many environmental moderators also likely to be continuous in nature– Age

– Gestational age

– Socio-economic status

– Educational level

– Consumption of food / alcohol / drugs

• How to test for G x E interaction in this case?

Continuous moderators

• Problems?– Stratification of sample reduced sample size

– Modelling proportions of variance• implicitly assumes equality of variance w.r.t moderator

– Logical to assume a linear G E interaction• linearity at the level of effect, not variance

– No obvious statistical test for heterogeneity

Heritability

4 6 8 10Age (yrs)0%

100%

Biometrical G E model

• At a hypothetical single locus– additive genetic value a– allele frequency p– QTL variance 2p(1-p)a2

• Assuming a linear interaction– additive genetic value a + M– allele frequency p– QTL variance 2p(1-p)(a +

M)2

Biometrical G E model

M

No interaction

AaAA aa

a

0

-a

M

Interaction

1

-1

M

Equivalently…

21

1

Model-fitting approach to GxE

Twin 1

A C E

Twin 2

A C E

a c e c ea


Twin 1

A C E

Twin 2

A C E

a+XM c e c ea+XM

Continuous moderator variable MCan be coded 0 / 1 in the dichotomous case

Individual specific moderators

Twin 1

A C E

Twin 2

A C E

a+XM1 c e c ea+XM2

E x E interactions

Twin 1

A C E

Twin 2

A C E

a+XM1

c+YM1

e+ZM1a+XM2

c+YM2

e+ZM2

ACE - XYZ - M

Twin 1

A C E

Twin 2

A C E

a+XM1

c+YM1

e+ZM1a+XM2

c+YM2

e+ZM2

M

m+MM1

M

m+MM2

Main effects and moderating effects statistically and conceptually distinct


C

A

ECom

pone

nt o

f var

ianc

e

Moderator variable

Turkheimer et al (2003)

• 320 twin pairs recruited at birth from urban hospitals

• G : additive genetic variance

• E : SES – parental education, occupation, income

• X : IQ– Wechsler; Verbal, Performance, Full

A C EF

ull s

cale

IQ

Ver

bal

IQN

on-V

erba

lIQ

Standard model

• Means vector

• Covariance matrix

mm

22222

222

ecacZa

eca

Allowing for a main effect of X

• Means vector


ii XmXm 21

22222

222

ecacZa

eca

! Basic model + main effect of a definition variableG1: Define MatricesData Calc NGroups=3Begin Matrices;A full 1 1 freeC full 1 1 freeE full 1 1 freeM full 1 1 free ! grand meanB full 1 1 free ! moderator-linked means modelH full 1 1R full 1 1 ! twin 1 moderator (definition variable)S full 1 1 ! twin 2 moderator (definition variable)End Matrices;Ma M 0Ma B 0Ma A 1Ma C 1Ma E 1Matrix H .5Options NO_OutputEnd

G2: MZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 1 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _A*A' + C*C' | A*A' + C*C' + E*E' /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2Options NO_OutputEnd

G3: DZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 2 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | H@A*A' + C*C' _H@A*A' + C*C' | A*A' + C*C' + E*E' /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2

End

MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3228 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.3381 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1051 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9728 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1035 Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3123.925 Degrees of freedom >>>>>>>>>>>>>>>> 795

MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3078 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.0000 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1733 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9749 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1069 Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3138.157 Degrees of freedom >>>>>>>>>>>>>>>> 796

Continuous heterogeneity model

• Means vector


ii XmXm 21

22

22

222121

21

21

21

)()()())(())((

)()()(

iZiYiXiYiYiXiX

iZiYiX

XeXcXaXcXcXaXaZ

XeXcXa

! GxE - Basic modelG1: Define MatricesData Calc NGroups=3Begin Matrices;A full 1 1 freeC full 1 1 freeE full 1 1 freeT full 1 1 free ! moderator-linked A componentU full 1 1 free ! moderator-linked C componentV full 1 1 free ! moderator-linked E componentM full 1 1 free ! grand meanB full 1 1 free ! moderator-linked means modelH full 1 1R full 1 1 ! twin 1 moderator (definition variable)S full 1 1 ! twin 2 moderator (definition variable)End Matrices;Ma T 0Ma U 0Ma V 0Ma M 0Ma B 0Ma A 1Ma C 1Ma E 1Matrix H .5Options NO_OutputEnd

G2: MZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 1 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2Options NO_OutputEnd

G3: DZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 2 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | H@(A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _H@(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2

End

Practical 1

• The script: mod.mx• The data: f1.datID zygosity trait_twin_1 trait_twin_2 mod_twin_1 mod_twin_2

1. Any evidence for G × E for this trait ?• i.e. does the A latent variable show heterogeneity with

respect to the moderator variable

2. If so, in what way?• i.e. how would you interpret/describe the effect?

Practical 1 : f1.dat

Fra

ctio

n

v5-2.43686 2.45835

0

.095

Moderator distribution

v3

v4-4.68825 5.3401

-4.59727

5.46955

v3

v4-4.42047 5.43616

-5.59837

4.77361

MZ pairs (trait)

DZ pairs (trait)

v3

v5-2.43686 2.45835

-5.59837

5.46955

All twin 1’s v.s. moderator

nomod.mx

a 1.3078 a2 ~ 1.7

c 1.1733 c2 ~ 1.4

e 0.9749 e2 ~ 0.95

a2+c2+e2 = 4.05

i.e. % variance is 42%, 35% and 23%

Parameter estimates: mod.mxACE-XYZ-M ACE-YZ-M

A 1.2288 1.4455

C 0.9874 0.6837

E 0.9236 0.9484

T -0.6007

U 0.1763 -0.6817

V 0.3825 0.4663

M 0.0737 0.0724

B 0.367 0.3625

Plotting VCs

• For the additive genetic VC, for example– Given a, and a range of values for the

moderator variable

• For example, a = 0.5, = -0.2 and M ranges from -2 to +2

M (a+M)2 (a+M)2

-2 (0.5+(-0.2×-2))2 0.81

-1.5 (0.5+(-0.2×-1.5))2 0.73

…

+2 (0.5+(-0.2×2))2 0.01 -0.1

0.1

0.3

0.5

0.7

0.9

1.1

1.3

1.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

M

VC

0

1

2

3

4

5

6

7

8

9

10

-4 -3 -2 -1 0 1 2 3 4

Moderator

Va

ria

nc

e C

om

po

ne

nts

A

C

E

Specific test of G×E

-2LL Df

Full model

ACE-XYZ-M3024.689 792

Sub model

ACE-YZ-M3034.898 793

Difference 10.209 1

p-value = 0.00139

Other tests

Test Submodel -2LL Δdf p-value

Y ACE-XZ-M 3025.782 1 0.29

Z ACE-XY-M 3110.429 1 < 1×10-19

M ACE-XYZ 3039.370 1 0.00013

C & Y AE-XZ-M 3026.228 2 0.46

All made against the full model

ACE-XYZ-M, -2LL = 3024.689

Confidence intervals

• Easy to get CIs for individual parameters

• Additionally, CIs on the moderated VCs are useful for interpretation

• e.g. a 95% CI for (a+M)2, for a specific M

• Define two extra vectors in Group 1

P full 1 13 O Unit 1 13 Matrix P -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

• Add a 4th group to calculate the CIs

CIsCalc Matrices = Group 1Begin Algebra; F= ( A@O + T@P ) . ( A@O + T@P ) / G= ( C@O + U@P ) . ( C@O + U@P ) / I= ( E@O + V@P ) . ( E@O + V@P ) /End Algebra;Interval @ 95 F 1 1 to F 1 13Interval @ 95 G 1 1 to G 1 13 Interval @ 95 I 1 1 to I 1 13 End;

Calculation of CIs

F= ( A@O + T@P ) . ( A@O + T@P ) /

• E.g. if P were 210

210111 xathen ( A@O + T@P ) equals

xaxaa 2

222 2xaxaa

or

Finally, the dot-product squares all elements to give

xxaaa 20or

Confidence intervals on VCs

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

A C E

Other considerations

• Simple approach to test for heterogeneity – easily adapted, e.g. for ordinal data models

• Extensions / things to watch for…– scalar v.s. qualitative heterogeneity

• v. low power

– the environment may show shared genetic influence with the trait

– nonlinear effects in both mediation and moderation

X

E

G

Main effect

Moderating G E

rGE

Turkheimer et al, 2003

SES

IQ

SES

V(IQ)

Simulated twin data

Moderator Standard

Qua

drat

ic

E(T

rait)

A 3 df test of any moderating effect Standard analysis : linear means model (in HA and H0) Quadratic analysis : linear and quadratic means model (in HA and H0)

18/50 replicates significanti.e. type I error 36% for nominal 5% level

More complex G E interaction

E-risk

Trait P(disease)

Include E-risk in means model

E-risk

Residual Trait P(disease | E-risk)

Biometrical model

E-risk

Additive genetic effect

Quadratic form

AaAA aa

ACE - XYZ - X2Y2Z2 - M

Twin 1

A C E

Twin 2

A C E

a +XM1 +XM2

1

c e c ea+XM2

+XM22

continuous heterogeneity

Documents

data vectors

matrix x

matrix h

matrix z

matrix y

rectangular continuous

t1 t2 zyg definition

c e h