distinguishing genetic correlation from causation among 52 ... · distinguishing genetic...
TRANSCRIPT
Distinguishing genetic correlation from causationamong 52 diseases and complex traits
Luke J O'ConnorHarvard T.H. Chan School of Public Health
Pre-print on biorxiv
What is a genetic correlation?
Psychiatric Genomics Consortium 2013 Nat Genet; Bulik-Sullivan et al. 2015b Nat Genet
Correlation across SNPs Correlation across Individuals
What is Mendelian Randomization?
Primary motivation: modifiable exposures to reduce disease risk
If LDL causes CAD: (Voight et al. 2012 Lancet)
– SNPs associated high LDL are associated with higher CAD risk– Individuals with high LDL alleles have higher CAD risk
If LDL does not cause CAD, and no genetic correlation due topleiotropy:– SNPs associated high LDL are not associated with CAD– Individuals with high LDL alleles have equal CAD risk
Mendelian randomization is geneticcorrelation restricted to top SNPs
Davey Smith and Hemani 2014 Hum Mol Genet; Bulik-Sullivan et al. 2015b Nat Genet
Regress SNP effects Regress genetic values
Pleiotropy is common
Pickrell et al 2016 Nat Genet
Pleiotropy: same variantaffects multiple traits
– Effects may or maynot be correlated
– May or may not bedue to causalrelationship betweentraits
Genetic Correlations are common
Bulik-Sullivan et al 2015b Nat Genet
Bidirectional MR distinguishes correlationfrom causation?
If A has a causal effect on B, then:
Pickrell et al 2016 Nat Genet
– Most variants ascertainedfor B do not affect A
– All variants ascertainedfor A do affect B
Outline
1. Latent Causal Variable model to distinguishcorrelation from causation
2. Comparison with MR in simulations
3. Application to MI and other traits
Latent causal variable model
Value of trait kEffect of on
Latent causal variable
Latent causal variable model
Genotype
Value of trait kEffect of on
Latent causal variable
Effect of on Effect of on not mediated by
When
Special case: full genetic causality
Genotype
Value of trait kEffect of on
Latent causal variable
Effect of on Effect of on not mediated by
Genetic causality proportion measuresdegree of partial causality
Genetic causality proportion (gcp): number x such that
gcp=1: trait 1 fully genetically causal for trait 2
gcp=-1: trait 2 fully genetically causal for trait 1
gcp=0: no partial causality
Key intuition: if trait 1 causal, then SNPsaffecting trait 1 have proportional effects ontrait 2, but not vice versaKey equation: relates mixed fourth momentswith q under the LCV model
Inference using mixed fourth moments
Key equation relates mixed fourth momentswith q
Excess kurtosis (zerowhen Gaussian)
Estimate fromsummary statistics
Genetic correlation(estimate using LDSC)Want
Posterior estimation of gcp
Mixed 4th moments, block jackknife →approximate likelihood
Uniform prior → posterior mean, standard error
Hypothesis testing: does gcp = 0?
Outline
1. Latent Causal Variable model to distinguishcorrelation from causation
2. Comparison with MR in simulations
3. Application to MI and other traits
Simulations: comparison with MR methods
● Comparison with:– Two-sample MR (Burgess et al. 2013 Genet Epidemiol)
– MR-Egger (Bowden et al. 2015 Int J Epidemiol)
– Bidirectional MR (Pickrell et al. 2016 Nat Genet)
● M=50k no LD● N=100k disjoint cohorts
● h2g = 0.3, h2
GWAS ~ 0.15
Uncorrelated pleiotropic effects: all methodswell calibrated
● Pleiotropic SNPs explaining 20%of heritability for both traits
● Zero genetic correlation
Nonzero genetic correlation: MRconfounded
● SNPs with correlated pleiotropiceffects explaining 20% ofheritability for both traits
● Genetic correlation: 0.2
Unequal polygenicity between traits: Bi-MR(and MR) confounded
● SNPs affecting trait 1 only: highper-SNP heritability
● SNPs affecting trait 2 only: lowper-SNP heritability (4x difference)
● Genetic correlation: 0.2● Similar results for unequal power
Full genetic causality: all methods (exceptMR-Egger) well powered
● All SNPs affecting trait 1 alsoaffect trait 2
● Genetic correlation: 0.2● High power in more challenging
simulations as well
Unbiased posterior estimates in simulationswith LD
● Real LD patterns● gcp values drawn from
prior distribution● Unequal polygenicity and
power● Unbiasedness:
Outline
1. Latent Causal Variable model to distinguishcorrelation from causation
2. Comparison with MR in simulations
3. Application to MI and other traits
Application to 52 traits
● Summary statistic data:– 37 UK Biobank traits including MI (N=460k)– 16 other traits (average N=43k)
● Nominally significant genetic correlation: 430 trait pairs● Significant partial causality: 63 trait pairs (1% FDR)
– Many have low gcp estimates: probably not fullgenetic causality
Trait 1 Gen corr LCV p-val gcp est MR Ref
BMI 0.34 (0.09) 5x10-9 0.94 (0.11) Holmes 2014
Triglycerides 0.30 (0.06) 2x10-31 0.90 (0.08) Do 2013
LDL 0.17 (.08) 4x10-31 0.73 (.13) Voight 2012
Hypothyroidism 0.26 (0.05) 1x10-11 0.72 (0.16) Zhao 2017(null)
High cholesterol 0.52 (0.12) 2x10-4 0.71 (0.19) Voight 2012
Fasting glucose 0.19 (0.07) 4x10-4 0.62 (0.23) Ahmad 2015(T2D)
Traits affecting myocardial infarction:consistent with known biology
Effect of LDL on BMD consistent with RCTs
Trait 1 Trait 2 Gen corr LCV p-val gcp est MR ref
LDL BMD -0.12 (.05) 7x10-34 0.80 (.12)
● Familial defective apolipoprotein B-100: leads to highLDL and low BMD (Yerges-Armstrong et al. 2013 J Endocrinol Metab)
● Effect of statins on BMD in 7 trial meta-analysis (Wang etal. 2016 Medicine)
– Not interpreted specifically as evidence for an effect of LDL
– Modest effect size concordant with modest genetic correlation
Summary
● MR methods can be confounded by geneticcorrelations
● Partial genetic causality measured by genetic causalityproportion (gcp)
● LCV produces unbiased estimates of gcp and well-calibrated p-values
● LCV recapitulates known biology and identifies novelputative causal relationships
Acknowledgements
Alkes Price
Soumya Raychaudhuri
Ben Neale
Chirag Patel
Members of the Price lab
UK Biobank
Pre-print on biorxiv
Related work at ASHG2017: Morrison et al. poster 3004W