supplementary information for - nature research · 2014-04-28 · 1 supplementary information for...

75
1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A Wright 1,2,12 , Patrick F Sullivan 3,12 , Andrew I Brooks 4 , Fei Zou 5 , Wei Sun 5 , Kai Xia 5 , Vered Madar 5 , Rick Jansen 6 , Wonil Chung 5 , Yi-Hui Zhou 1 , Abdel Abdellaoui 7 , Sandra Batista 8 , Casey Butler 8 , Guanhua Chen 5 , Ting-Huei Chen 5 , David D'Ambrosio 9 , Paul Gallins 3 , Min Jin Ha 5 , Jouke Jan Hottenga 7 , Shunping Huang 8 , Mathijs Kattenberg 7 , Jaspreet Kochar 9 , Christel M Middeldorp 7 , Ani Qu 9 , Andrey Shabalin 10 , Jay Tischfield 4 , Laura Todd 3 , Jung-Ying Tzeng 1 , Gerard van Grootheest 6 , Jacqueline M Vink 7 , Qi Wang 9 , Wei Wang 11 , Weibo Wang 8 , Gonneke Willemsen 7 , Johannes H Smit 6 , Eco J de Geus 7 , Zhaoyu Yin 5 , Brenda WJH Penninx 6 , Dorret I Boomsma 7 . 1 Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC, USA. 2 Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA. 3 Department of Genetics, University of North Carolina at Chapel Hill, NC, USA. 4 Department of Genetics, Rutgers University, New Brunswick, NJ, USA. 5 Department of Biostatistics, University of North Carolina at Chapel Hill, NC, USA. 6 Department of Psychiatry, VU Medical Center, Amsterdam, Netherlands. 7 Department of Biological Psychology, VU University, Amsterdam, Netherlands. 8 Department of Computer Science, University of North Carolina at Chapel Hill, NC. 9 Environmental and Occupational Health Sciences Institute, Rutgers University, New Brunswick, NJ, USA. 10 Department of Pharmacotherapy & Outcomes Science, Virginia Commonwealth University, Richmond, VA, USA. 11 Department of Computer Science, University of California, Los Angeles, USA. 12 These authors contributed equally to this work. Correspondence should be addressed to F.A.W. ([email protected]) or P.F.S ([email protected]) Nature Genetics: doi:10.1038/ng.2951

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

1

SUPPLEMENTARY INFORMATION FOR

HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD

Fred A Wright1,2,12, Patrick F Sullivan3,12, Andrew I Brooks4, Fei Zou5, Wei Sun5, Kai Xia5, Vered Madar5, Rick Jansen6, Wonil Chung5, Yi-Hui Zhou1, Abdel Abdellaoui7, Sandra Batista8, Casey Butler8, Guanhua Chen5, Ting-Huei Chen5, David D'Ambrosio9, Paul Gallins3, Min Jin Ha5, Jouke Jan Hottenga7, Shunping Huang8, Mathijs Kattenberg7, Jaspreet Kochar9, Christel M Middeldorp7, Ani Qu9, Andrey Shabalin10, Jay Tischfield4, Laura Todd3, Jung-Ying Tzeng1, Gerard van Grootheest6, Jacqueline M Vink7, Qi Wang9, Wei Wang11, Weibo Wang8, Gonneke Willemsen7, Johannes H Smit6, Eco J de Geus7, Zhaoyu Yin5, Brenda WJH Penninx6, Dorret I Boomsma7.

1Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC, USA. 2Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA. 3Department of Genetics, University of North Carolina at Chapel Hill, NC, USA. 4Department of Genetics, Rutgers University, New Brunswick, NJ, USA. 5Department of Biostatistics, University of North Carolina at Chapel Hill, NC, USA. 6Department of Psychiatry, VU Medical Center, Amsterdam, Netherlands. 7Department of Biological Psychology, VU University, Amsterdam, Netherlands. 8Department of Computer Science, University of North Carolina at Chapel Hill, NC. 9Environmental and Occupational Health Sciences Institute, Rutgers University, New Brunswick, NJ, USA. 10Department of Pharmacotherapy & Outcomes Science, Virginia Commonwealth University, Richmond, VA, USA. 11Department of Computer Science, University of California, Los Angeles, USA. 12These authors contributed equally to this work.

Correspondence should be addressed to F.A.W. ([email protected]) or P.F.S ([email protected])

Nature Genetics: doi:10.1038/ng.2951

Page 2: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

2

Supplementary Figure 1. Effects of covariates and mean expression on heritability and shared environmental effects (n = 2,752). (a) Adjusted R2 for all covariates (not including hybridization plate) for predicting expression levels for each of 43,628 transcripts. (b) Heritability estimates (h2) from the ACE model applied to all transcripts, before and after covariate correction. Negative values are implausible, reflecting sampling variation, but the entire range is shown for illustration, and the estimates are unbiased. The correction generally strengthens the evidence for the most highly significant transcripts. (c) Shared environment (twinship) effect estimates c2, before and after covariate correction. (d) Covariate-corrected observed versus expected right-tailed P values for h2 show a large number of significant transcripts. (e) The same plot for positive c2 shows that none reaches transcriptome-wide significance, nor did any transcript show significant negative c2 evidence (data not shown).

Nature Genetics: doi:10.1038/ng.2951

Page 3: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

3

Supplementary Figure 2. Mean heritability estimates. (a) Mean heritability as a function of gene proximity to 3,931 NHGRI GWAS catalog SNPs with GWAS P < 5.0 × 10–8. For each of the NHGRI catalog SNPs, the closest gene was recorded, then the second closest, and so on, and each gene was designated according to whether it was the kth-closest gene to at least one SNP in the catalog. The mean heritability as a function of these ranks shows that genes with higher proximity rank tend to have higher h2. A simple rank correlation of the two axis values gives P = 0.017. For each k, the number of genes included is ~1,600. (b) Mean heritability as a function of the number of SNPs that are closest to the gene. For each gene, the number of significant NHGRI catalog SNPs for which the gene is closest was recorded, and the mean heritability was displayed for that group. Using all genes, the rank correlation of h2 versus the number of closest GWAS SNPs gives P = 8.2 × 10–16. For each point, the number of genes included in the calculation is shown.

Nature Genetics: doi:10.1038/ng.2951

Page 4: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

4

Supplementary Figure 3. Contribution of components to expression variation for 9,060 genes with h2>0.1 (whether or not declared expressed). (a) Ratio of r2 (variation explained) by best local SNP to overall (twin-based) h2, with medians and means of ratios, as well as the proportion of h2 explained by the best local SNP. (b–d) Analogous plots and values for the best distant SNP (b), local GCTA estimation (c) and local IBD estimation (d) using DZ twins. Analyses using only expressed genes are similar throughout, due to the restriction h2 > 0.1.

Nature Genetics: doi:10.1038/ng.2951

Page 5: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

5

Supplementary Figure 4. Effects of sample size on the reliability of h2 estimates in twin-based designs and comparison with additive-only variance components, applied to 18,392 genes. (a) From the shrunken ‘true’ estimate of the h2 distribution in NTR, the distribution of estimated h2 was determined, using the twin proportions of the Brisbane Systems Genetics Study (BSGS) and assuming no family effects. These hypothetical h2 estimates from NTR (using the shrunken h2 distribution and computed error variation applicable to the BSGS sample size and analysis approach) very closely matches their published report. BSGS values were obtained by digitizing the plot from the authors’ report using WebPlotDigitizer (http://arohatgi.info/WebPlotDigitizer/app/). (b) Using the shrunken ‘true’ estimate of the h2 distribution from NTR and the standard error of a twin-based design and ACE model (with MZ vs. DZ proportions the same as in NTR), the rank correlation of estimated h2 versus true h2 for the transcriptome as a function of total sample size.

Nature Genetics: doi:10.1038/ng.2951

Page 6: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

6

Supplementary Figure 5. Comparison of twin-based heritability to alternate sources of heritability information, using the 18,392 best h2 set of unique genes. (a) Best local SNP r2 versus twin-based h2. The proportion variance in twin-based h2 explained by the best local SNP in a linear regression model is listed . Each gene was classified as ‘local’ or ‘distant’ according to the smaller of the eQTL P values (regardless of genome-wide significance). (b) Local GCTA r2 versus twin-based h2. (c) DZ local IBD analysis versus twin-based h2. The local IBD analysis is less powerful, as it is based on roughly half of the data. (d) A model using the three predictors in a–c produces only a slight improvement in prediction.

Nature Genetics: doi:10.1038/ng.2951

Page 7: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

7

Supplementary Figure 6. Reproducibility of GODOT eQTLs. (a) Local eQTL –log10(q) values for NESDA versus NTR. (b) Inset to highlight less significant local eQTLs. (c) Distant eQTL –log10(q) values for NESDA versus NTR. The values shown are interchromosomal eQTLs, i.e., the SNP and target gene are on different chromosomes, ensuring that they are truly distant. (d) Inset to highlight less significant distant eQTLs.

Nature Genetics: doi:10.1038/ng.2951

Page 8: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

8

Supplementary Figure 7. Characteristics of local eQTLs. The 6,864 SNPs involved in the 6,941 local eQTLs were annotated using Variant Effect Predictor (version 2.8) of Ensembl (see the Supplementary Note). (a) The proportion of local eQTLs located in regulatory regions. The number on the top of each bar is P value indicating over-representation (red) and under-representation (blue) of the eQTLs located in regulatory regions. (b) The proportion of replicating local eQTLs.

Nature Genetics: doi:10.1038/ng.2951

Page 9: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

9

Supplementary Figure 8. P values of SNPs in NTR and NESDA for genes declared significant in Westra et al.. (a) Local eQTL replication P values in NTR. (b) Local eQTL replication P-values in NESDA. (c) Distant eQTL replication P values in NTR. (d) Distant eQTL replication P values in NESDA.

Nature Genetics: doi:10.1038/ng.2951

Page 10: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

10

Supplementary Figure 9. Properties of replicating distant eQTLs. The 304 SNPs involved in the 348 distant eQTLs were annotated using Variant Effect Predictor (version 2.8) of Ensembl. Fourteen (26%) of 53 SNPs annotated as intergenic variants were replicated in NESDA, which is significantly lower than the overall replication rate of 47%. There was no significant enrichment or deficiency of replication in other categories.

Nature Genetics: doi:10.1038/ng.2951

Page 11: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

11

Supplementary Figure 10. Distant eSNPs are more likely to be local eQTLs. For each of the 304 SNPs of distant eQTLs, we assessed its association with local genes (< 1 Mb away), and we then grouped these 304 SNPs by their minimum local eQTL P values. The value above each black bar shows the number of SNPs belonging to each significance grouping. For comparison, we randomly selected 10,000 SNPs from the remaining SNPs, matched to the 304 SNPs by minor allele frequency and imputation quality R2, and then grouped analogously by their minimum local eQTL P values.

Nature Genetics: doi:10.1038/ng.2951

Page 12: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

12

Supplementary Figure 11. The distribution of 348 distant eQTLs. A total of 304 SNPs were involved in the 348 distant eQTLs. The SNPs were clustered by genomic position so that a SNP was assigned to a cluster if its distance to any SNP already in the cluster was smaller than 1 Mb. The SNPs grouped into 203 clusters, 160 clusters with only a single SNP. Forty-three clusters with more than one SNP per cluster spanned 2 kb to 2 Mb, with median size of 89 kb. The plot shows the number of eQTLs belonging to each cluster. Eleven clusters with more than five eQTLs are highlighted in the inset table. The number of NESDA eQTLs (q value < 0.01) associated with each cluster is also shown.

Nature Genetics: doi:10.1038/ng.2951

Page 13: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

13

Supplementary Figure 12. Distant eQTLs and their associated genes on eight chromosomes. (a) A distant eQTL on chromosome 19 is associated with the expression of 12 distant genes and a local gene MOY1F. The network plot shows the partial correlation graph of these 13 genes, where an edge indicating a nonzero partial correlation. The partial correlations were estimated by the penalized estimation method cited in the Supplementary Note. (b) A distant eQTL on chromosome 20 is associated with the expression of six distant genes and a local gene, SMOX. Using a likelihood ratio test approach and independent genotype and gene expression data from NESDA (to avoid winner’s curse), we found that causal relations, eSNP → SMOX → a distant eQTL gene, are significantly more likely than other possible relations for these six distant eQTL genes, using the likelihood approach described in the Supplementary Note. (c) Distant eQTLs on six other chromosomes.

Nature Genetics: doi:10.1038/ng.2951

Page 14: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

14

Supplementary Table 1: K-means clustering of 777 significantly heritable genes.

Cluster Genes Tightness DAPPLE P Tissue similarity GO biological process GO cell compartment

1 30 0.251 0.003

Brain

Myocytes

Immune

Defense response

Inflammatory response

Response to wounding

Cell surface

External side of plasma membrane

2 81 0.241 0.049 Lung

Immune

Response to oxidative stress

Cellular homeostasis Cell cortex

3 39 0.376 0.028 Brain

Immune

Coagulation

Wound healing

Secretory granule

Vesicle

4 78 0.090 0.77 Brain

Immune

Regulation of protein kinase cascade

Apoptosis Plasma membrane

5 80 0.081 0.009 Myocytes

Immune Response to extracellular stimuli Plasma membrane

6 24 0.464 0.16

Brain

Immune

Myocytes

Adipocytes

None Plasma membrane

7 106 0.042 0.001

Brain

Immune

Adipocytes

MHC class II

Antigen processing

Response to wounding

Lysosome

Plasma membrane

8 313 0.006 0.13 Appendix

Brain

MHC class I

Antigen processing

Adaptive immunity

None

9 26 0.248 0.001

Trachea

Skin

Immune

Brain

B-cell activation Plasma membrane

Clustering from k-means clustering of correlation matrix of 777 significantly heritable genes. The DAPPLE column shows permutation p-values for the connectivity of these genes in protein-protein interaction space.

1 Pathway analyses using DAVID.

2

Nature Genetics: doi:10.1038/ng.2951

Page 15: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

15

Supplementary Table 2: Physical clustering of heritable transcripts.

Chr Mb h2 clustering Gene features within clusters

1 161-162 Greater IgG Fc fragment receptors

3 46-47 Greater chemokine receptors and lactotransferrin

5 96-97 Greater antigen loading and cytokine function

6 32-33 Greater major histocompatibility complex class II

6 133-134 Greater vanin genes involved in hematopoetic cell trafficking

12 10-11 Greater lectin-related gene cluster

19 52-55 Greater heterogeneous set of genes enriched for immune function

5 140-141 Lesser neuronal protocadherin gene cluster (very high gene density)

The genome was divided into 1 Mb bins, and recorded the numbers of genes on the Affymetrix U219 chip per Mb and the numbers of significantly heritable genes per Mb. To identify 1 Mb bins with greater or fewer expressed genes, we fit a linear regression model predicting the number of high heritability genes by the numbers of transcripts. Bins that were outliers in this regression are included above (defined as Studentized residuals <-4 or >4).

Nature Genetics: doi:10.1038/ng.2951

Page 16: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

16

Supplementary Table 3: Pathway analysis of genes with mean expression-corrected h2

(Bonferroni corrected P<0.05)

KEGG (15 of 205 pathways)

Enrichment z P KEGG ID Genes Pathway

7.01 2.43E-12 KEGG:04514 126 Cell adhesion molecules (CAMs)

6.49 8.53E-11 KEGG:05332 30 Graft-versus-host disease

5.89 3.95E-09 KEGG:05150 44 Staphylococcus aureus infection

5.81 6.11E-09 KEGG:05330 31 Allograft rejection

5.6 2.11E-08 KEGG:04612 56 Antigen processing and presentation

5.34 9.18E-08 KEGG:04940 36 Type I diabetes mellitus

5.32 1.02E-07 KEGG:04145 137 Phagosome

5.13 2.93E-07 KEGG:04640 82 Hematopoietic cell lineage

4.65 3.33E-06 KEGG:05416 63 Viral myocarditis

4.5 6.67E-06 KEGG:05310 25 Asthma

4.24 2.20E-05 KEGG:05320 43 Autoimmune thyroid disease

4.14 3.40E-05 KEGG:04520 73 Adherens junction

3.82 0.000134 KEGG:05412 74 Arrhythmogenic right ventricular cardiomyopathy (ARVC)

3.76 0.000167 KEGG:04610 66 Complement and coagulation cascades

3.68 0.000233 KEGG:04672 42 Intestinal immune network for IgA production

GO Biological Process pathways (43 of 3468 pathways)

Enrichment z P GO ID Genes Pathway name

5.93 3.12E-09 GO:0002576 79 platelet degranulation

5.69 1.29E-08 GO:0006959 89 humoral immune response

5.6 2.14E-08 GO:0001819 182 positive regulation of cytokine production

5.58 2.45E-08 GO:0002250 167 adaptive immune response

5.55 2.79E-08 GO:2000178 13 negative regulation of neural precursor cell proliferation

5.49 4.06E-08 GO:0002460 150

adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains

5.25 1.50E-07 GO:0050663 81 cytokine secretion

5.24 1.58E-07 GO:0050708 110 regulation of protein secretion

5.16 2.43E-07 GO:0002443 173 leukocyte mediated immunity

5.09 3.50E-07 GO:0002449 134 lymphocyte mediated immunity

5.01 5.51E-07 GO:0006909 81 phagocytosis

5 5.63E-07 GO:0002532 15 production of molecular mediator involved in inflammatory response

4.86 1.18E-06 GO:0030193 62 regulation of blood coagulation

Nature Genetics: doi:10.1038/ng.2951

Page 17: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

17

-4.85 1.21E-06 GO:0050907 59 detection of chemical stimulus involved in sensory perception

4.85 1.23E-06 GO:0031589 198 cell-substrate adhesion

4.84 1.30E-06 GO:0044403 98 symbiosis, encompassing mutualism through parasitism

4.81 1.51E-06 GO:0050818 67 regulation of coagulation

4.81 1.52E-06 GO:0034330 182 cell junction organization

4.81 1.52E-06 GO:0002683 152 negative regulation of immune system process

4.8 1.58E-06 GO:0010951 117 negative regulation of endopeptidase activity

4.8 1.60E-06 GO:0050707 69 regulation of cytokine secretion

4.79 1.63E-06 GO:0033559 63 unsaturated fatty acid metabolic process

4.75 2.08E-06 GO:0009306 148 protein secretion

4.72 2.32E-06 GO:0042113 146 B cell activation

-4.71 2.42E-06 GO:0050911 35 detection of chemical stimulus involved in sensory perception of smell

4.71 2.44E-06 GO:0050764 37 regulation of phagocytosis

4.63 3.62E-06 GO:0002699 92 positive regulation of immune effector process

4.61 4.02E-06 GO:0010466 120 negative regulation of peptidase activity

4.61 4.04E-06 GO:0006691 28 leukotriene metabolic process

4.6 4.30E-06 GO:0050715 49 positive regulation of cytokine secretion

4.58 4.66E-06 GO:0002697 192 regulation of immune effector process

4.55 5.40E-06 GO:0045766 77 positive regulation of angiogenesis

4.55 5.44E-06 GO:0030335 200 positive regulation of cell migration

4.51 6.51E-06 GO:0030100 115 regulation of endocytosis

4.5 6.91E-06 GO:0006690 59 icosanoid metabolic process

4.45 8.64E-06 GO:0002504 16 antigen processing and presentation of peptide or polysaccharide antigen via MHC class II

4.44 9.10E-06 GO:0035821 58 modification of morphology or physiology of other organism

4.43 9.51E-06 GO:0019370 24 leukotriene biosynthetic process

4.41 1.03E-05 GO:0034341 99 response to interferon-gamma

4.41 1.04E-05 GO:0009595 23 detection of biotic stimulus

4.38 1.18E-05 GO:0019724 83 B cell mediated immunity

4.37 1.25E-05 GO:0006636 48 unsaturated fatty acid biosynthetic process

4.35 1.38E-05 GO:0061041 78 regulation of wound healing

GO Cellular Component pathways (19 of 402 pathways)

Enrichment z P GO ID Genes Pathway name

6.21 5.35E-10 GO:0031091 57 platelet alpha granule

5.5 3.71E-08 GO:0031093 45 platelet alpha granule lumen

Nature Genetics: doi:10.1038/ng.2951

Page 18: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

18

5.23 1.72E-07 GO:0031983 51 vesicle lumen

5.19 2.15E-07 GO:0034774 46 secretory granule lumen

5.18 2.22E-07 GO:0060205 49 cytoplasmic membrane-bounded vesicle lumen

5.08 3.84E-07 GO:0005913 40 cell-cell adherens junction

5.06 4.14E-07 GO:0005912 187 adherens junction

4.97 6.81E-07 GO:0009897 175 external side of plasma membrane

4.64 3.48E-06 GO:0001725 40 stress fiber

4.59 4.34E-06 GO:0032432 42 actin filament bundle

4.57 4.85E-06 GO:0042613 10 MHC class II protein complex

4.45 8.69E-06 GO:0030139 128 endocytic vesicle

4.35 1.35E-05 GO:0042641 49 actomyosin

4.12 3.86E-05 GO:0042611 30 MHC protein complex

4.11 3.96E-05 GO:0030018 64 Z disc

4.07 4.67E-05 GO:0012507 24 ER to Golgi transport vesicle membrane

4.04 5.25E-05 GO:0030670 28 phagocytic vesicle membrane

4.04 5.28E-05 GO:0030136 174 clathrin-coated vesicle

3.88 0.000105 GO:0030027 109 lamellipodium

GO Molecular Function pathways (6 of 764 pathways).

Enrichment z P GO ID Genes Pathway name

5.02 5.19E-07 GO:0042605 12 peptide antigen binding

4.54 5.73E-06 GO:0003823 32 antigen binding

4.39 1.15E-05 GO:0046906 118 tetrapyrrole binding

4.3 1.67E-05 GO:0020037 109 heme binding

4.11 3.96E-05 GO:0016892 16

endoribonuclease activity, producing 3'-phosphomonoesters

4.08 4.60E-05 GO:0032393 15 MHC class I receptor activity

Nature Genetics: doi:10.1038/ng.2951

Page 19: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

19

Supplementary Table 4: Genes with significant heritability & genome-wide significant SNPs from GWAS.

Gene Mean

Expr.

a2 P.a

2 GWAS

pmid GWAS phenotype SNP P.SNP

AHI1 4.41 0.36 1.5E-04 21833088 Multiple sclerosis rs11154801 1.0E-13

AIF1 7.58 0.47 8.2E-06 22267201 Menopause (age at onset) rs1046089 2.0E-16

AIF1 7.58 0.47 8.2E-06 21946350 Pulmonary function rs2857595 2.0E-10

AIF1 7.58 0.47 8.2E-06 19079260 Weight rs2844479 2.0E-08

ALOX5 5.57 0.39 2.0E-04 20139978 Hematological & biochem traits rs2279434 4.0E-12

ALPL 7.04 0.41 1.5E-05 18940312 Liver enzyme levels rs1780324 7.0E-15

ALPL 7.04 0.41 1.5E-05 21886157 Metabolic traits rs10799701 3.0E-20

ALPL 7.04 0.41 1.5E-05 20558539 Phosphorus levels rs1697421 1.0E-27

ANK1 5.78 0.36 2.3E-04 20858683 Glycated hemoglobin levels rs4737009 6.0E-12

ANK1 5.78 0.36 2.3E-04 20858683 Glycated hemoglobin levels rs6474359 1.0E-08

ANK1 5.78 0.36 2.3E-04 22456796 Type 2 diabetes rs515071 1.0E-08

AP3S2 5.77 0.36 3.8E-04 21874001 Type 2 diabetes rs2028299 2.0E-11

AUTS2 3.97 0.43 2.7E-05 21471458 Alcohol consumption rs6943555 4.0E-08

BMF 5.00 0.35 5.9E-04 21533175 DHEAS rs7181230 5.0E-11

BMP6 3.69 0.35 6.2E-04 18391951 Height rs12198986 2.0E-11

C10orf32 4.91 0.37 4.2E-04 19430479 Systolic blood pressure rs1004467 1.0E-10

C16orf57 7.18 0.40 1.0E-05 21946350 Pulmonary function rs12447804 4.0E-08

C2orf88 5.08 0.40 1.0E-04 22344221 Body mass index rs13034723 2.0E-08

CAST 7.29 0.36 3.7E-04 21743469 Ankylosing spondylitis rs30187 2.0E-27

CAST 7.29 0.36 3.7E-04 20062062 Ankylosing spondylitis rs27434 5.0E-12

CAST 7.29 0.36 3.7E-04 20953190 Psoriasis rs27524 3.0E-11

CCR1 6.64 0.70 2.5E-14 20190752 Celiac disease rs13098911 3.0E-17

CD300LF 7.37 0.34 8.9E-04 20031577 Fibrinogen rs10512597 8.0E-11

CD36 6.46 0.35 2.4E-04 22423221 Mean platelet volume rs13236689 3.0E-09

CD8A 7.05 0.34 1.6E-04 21685912 Progressive supranuclear palsy rs6547705 1.0E-08

Nature Genetics: doi:10.1038/ng.2951

Page 20: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

20

CD9 4.65 0.59 2.4E-11 22139419 Platelet counts rs7342306 4.0E-11

CDCA7L 5.76 0.34 6.1E-04 22120009 Multiple myeloma rs4487645 3.0E-14

CHI3L1 7.11 0.72 1.5E-23 18403759 YKL-40 levels rs4950928 1.0E-13

CLU 6.94 0.60 1.7E-13 19734903 Alzheimer's disease rs2279590 6.0E-10

CLU 6.94 0.60 1.7E-13 19734903 Alzheimer's disease rs11136000 6.0E-10

CLU 6.94 0.60 1.7E-13 19734903 Alzheimer's disease rs9331888 6.0E-10

CLU 6.94 0.60 1.7E-13 19734902 Alzheimer's disease rs11136000 9.0E-10

CLU 6.94 0.60 1.7E-13 21627779 Alzheimer's disease rs569214 4.0E-08

CNN2 8.55 0.77 9.5E-20 21460840 Alzheimer's disease rs3764650 5.0E-17

CTSH 7.48 0.39 8.8E-06 18978792 Type 1 diabetes rs3825932 3.0E-15

DDT 7.50 0.47 4.0E-07 22001757 gamma-glutamyl transferase) rs2739330 2.0E-09

DDX6 6.04 0.48 2.0E-07 21383967 Celiac disease & Rheumatoid arthritis

rs10892279 1.0E-12

DISC1 5.19 0.40 9.4E-05 21483430 Neuranatomic & neurocog rs12042938 4.0E-36

ERAP1 5.99 0.58 1.9E-10 21743469 Ankylosing spondylitis rs30187 2.0E-27

ERAP1 5.99 0.58 1.9E-10 20062062 Ankylosing spondylitis rs27434 5.0E-12

ERAP1 5.99 0.58 1.9E-10 20953190 Psoriasis rs27524 3.0E-11

ERAP2 7.03 0.83 7.7E-30 21102463 Crohn's disease rs2549794 1.0E-10

F5 5.26 0.45 6.7E-06 21502573 D-dimer levels rs6687813 2.0E-14

F5 5.26 0.45 6.7E-06 22443383 Hemostatic factors & hematological phenotypes

rs2420371 4.0E-80

F5 5.26 0.45 6.7E-06 20167578 Soluble levels of adhesion molecules

rs6136 4.0E-61

F5 5.26 0.45 6.7E-06 21980494 Venous thromboembolism rs1018827 2.0E-26

FADS2 4.90 0.41 7.1E-07 19060911 Cholesterol, total rs174570 2.0E-10

FADS2 4.90 0.41 7.1E-07 19060911 LDL cholesterol rs174570 4.0E-13

FADS2 4.90 0.41 7.1E-07 22001757 alkaline phosphatase) rs174601 3.0E-09

FADS2 4.90 0.41 7.1E-07 21829377 Phospholipid levels (plasma) rs1535 3.0E-152

FADS2 4.90 0.41 7.1E-07 21829377 Phospholipid levels (plasma) rs1535 3.0E-63

FADS2 4.90 0.41 7.1E-07 21829377 Phospholipid levels (plasma) rs174448 3.0E-60

FADS2 4.90 0.41 7.1E-07 21829377 Phospholipid levels (plasma) rs174574 4.0E-55

Nature Genetics: doi:10.1038/ng.2951

Page 21: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

21

FADS2 4.90 0.41 7.1E-07 21829377 Phospholipid levels (plasma) rs174448 7.0E-28

FADS2 4.90 0.41 7.1E-07 21829377 Phospholipid levels (plasma) rs174448 4.0E-25

FBXO7 8.72 0.35 2.5E-04 19820697 Hematological parameters rs9609565 4.0E-10

FCER1A 6.40 0.45 7.0E-07 22075330 IgE levels rs2251746 5.0E-26

FCER1A 6.40 0.45 7.0E-07 18846228 IgE levels rs2251746 2.0E-20

FCER1A 6.40 0.45 7.0E-07 17903293 Select biomarker traits rs2494250 1.0E-14

FCGR2A 10.32 0.33 5.1E-04 22081228 Kawasaki disease rs1801274 7.0E-11

FCGR2A 10.32 0.33 5.1E-04 21297633 Ulcerative colitis rs1801274 2.0E-20

FCGR2A 10.32 0.33 5.1E-04 19915573 Ulcerative colitis rs1801274 2.0E-12

FCGR2A 10.32 0.33 5.1E-04 20228799 Ulcerative colitis rs10800309 3.0E-09

FCRL3 5.05 0.36 1.9E-04 21841780 Graves' disease rs3761959 2.0E-13

FCRL3 5.05 0.36 1.9E-04 21829393 Type 1 diabetes autoantibodies rs7528684 1.0E-11

GP1BA 6.06 0.46 1.2E-06 20139978 Hematological & biochem traits rs6065 2.0E-12

GUCY1A3 3.07 0.40 1.3E-04 21909115 Diastolic blood pressure rs13139571 2.0E-10

GZMB 7.44 0.47 3.6E-07 20410501 Vitiligo rs8192917 3.0E-08

HBD 7.45 0.34 1.7E-04 20183929 Beta thal/hemoglobin E disease rs2071348 3.0E-15

HBG1 8.31 0.84 8.0E-39 20183929 Beta thal/hemoglobin E disease rs2071348 3.0E-15

HCP5 8.36 0.54 4.4E-08 19115949 AIDS progression rs2395029 3.0E-19

HCP5 8.36 0.54 4.4E-08 19483685 flucloxacillin-induced liver injury rs2395029 9.0E-33

HCP5 8.36 0.54 4.4E-08 20041166 HIV-1 control rs2395029 5.0E-35

HCP5 8.36 0.54 4.4E-08 21051598 HIV-1 control rs2395029 1.0E-25

HCP5 8.36 0.54 4.4E-08 21051598 HIV-1 control rs2255221 4.0E-14

HCP5 8.36 0.54 4.4E-08 20041166 HIV-1 control rs2395029 1.0E-11

HCP5 8.36 0.54 4.4E-08 22286212 Hodgkin's lymphoma rs2248462 7.0E-16

HCP5 8.36 0.54 4.4E-08 22399527 Metabolic syndrome rs3099844 2.0E-08

HCP5 8.36 0.54 4.4E-08 20662065 Neonatal lupus rs3099844 5.0E-10

HCP5 8.36 0.54 4.4E-08 18369459 Psoriasis rs2395029 2.0E-26

HIP1 6.26 0.54 4.9E-12 19838193 Systemic lupus erythematosus rs1167796 2.0E-08

HLA-A 9.53 0.83 3.8E-34 22075330 IgE levels rs2571391 1.0E-15

Nature Genetics: doi:10.1038/ng.2951

Page 22: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

22

HLA-A 9.53 0.83 3.8E-34 22190364 Multiple sclerosis rs9260489 1.0E-11

HLA-A 9.53 0.83 3.8E-34 20512145 Nasopharyngeal carcinoma rs2860580 5.0E-67

HLA-A 9.53 0.83 3.8E-34 19664746 Nasopharyngeal carcinoma rs2517713 4.0E-20

HLA-B 9.96 0.62 4.0E-14 20062062 Ankylosing spondylitis rs7743761 5.0E-304

HLA-B 9.96 0.62 4.0E-14 21254220 Bipolar disorder rs9378249 1.0E-08

HLA-B 9.96 0.62 4.0E-14 21051598 HIV-1 control rs2523608 9.0E-20

HLA-B 9.96 0.62 4.0E-14 21051598 HIV-1 control rs2523590 2.0E-13

HLA-B 9.96 0.62 4.0E-14 18364390 Psoriasis rs3134792 1.0E-09

HLA-DOB 3.88 0.34 5.8E-04 22446962 Kawasaki disease rs2857151 5.0E-11

HLA-DPA1 9.68 0.51 1.5E-08 21814517 Asthma rs987870 2.0E-10

HLA-DPA1 9.68 0.51 1.5E-08 21841780 Graves' disease rs2281388 2.0E-65

HLA-DPA1 9.68 0.51 1.5E-08 21750111 Hepatitis B rs3077 2.0E-61

HLA-DPA1 9.68 0.51 1.5E-08 21750111 Hepatitis B rs9277535 3.0E-54

HLA-DPA1 9.68 0.51 1.5E-08 19349983 Hepatitis B rs9277535 6.0E-39

HLA-DPA1 9.68 0.51 1.5E-08 21764829 Hepatitis B vaccine response rs9277535 3.0E-12

HLA-DPA1 9.68 0.51 1.5E-08 21779181 Systemic sclerosis rs987870 2.0E-20

HLA-DPB1 9.14 0.49 1.3E-06 21814517 Asthma rs987870 2.0E-10

HLA-DPB1 9.14 0.49 1.3E-06 21841780 Graves' disease rs2281388 2.0E-65

HLA-DPB1 9.14 0.49 1.3E-06 21750111 Hepatitis B rs3077 2.0E-61

HLA-DPB1 9.14 0.49 1.3E-06 21750111 Hepatitis B rs9277535 3.0E-54

HLA-DPB1 9.14 0.49 1.3E-06 19349983 Hepatitis B rs9277535 6.0E-39

HLA-DPB1 9.14 0.49 1.3E-06 21764829 Hepatitis B vaccine response rs9277535 3.0E-12

HLA-DPB1 9.14 0.49 1.3E-06 21779181 Systemic sclerosis rs987870 2.0E-20

HLA-DQA1 7.01 0.69 1.6E-23 20860503 Asthma rs9273349 7.0E-14

HLA-DQA1 7.01 0.69 1.6E-23 20190752 Celiac disease rs2187668 1.0E-50

HLA-DQA1 7.01 0.69 1.6E-23 17558408 Celiac disease rs2187668 1.0E-19

HLA-DQA1 7.01 0.69 1.6E-23 20694011 Immunoglobulin A rs2187668 2.0E-33

HLA-DQA1 7.01 0.69 1.6E-23 20694011 Immunoglobulin A rs9271366 3.0E-33

HLA-DQA1 7.01 0.69 1.6E-23 21699788 Inflammatory bowel disease rs9271366 2.0E-70

Nature Genetics: doi:10.1038/ng.2951

Page 23: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

23

HLA-DQA1 7.01 0.69 1.6E-23 21699788 Inflammatory bowel disease rs9271366 3.0E-31

HLA-DQA1 7.01 0.69 1.6E-23 21699788 Inflammatory bowel disease rs9271366 8.0E-11

HLA-DQA1 7.01 0.69 1.6E-23 19525955 Multiple sclerosis rs9271366 7.0E-184

HLA-DQA1 7.01 0.69 1.6E-23 20453840 Multiple sclerosis rs2040406 1.0E-20

HLA-DQA1 7.01 0.69 1.6E-23 20598377 Multiple sclerosis rs9271366 4.0E-17

HLA-DQA1 7.01 0.69 1.6E-23 20512145 Nasopharyngeal carcinoma rs28421666 2.0E-18

HLA-DQA1 7.01 0.69 1.6E-23 21323541 Nephropathy (idiopathic membranous)

rs2187668 8.0E-93

HLA-DQA1 7.01 0.69 1.6E-23 21502966 Response interferon ß therapy rs9272105 4.0E-10

HLA-DQA1 7.01 0.69 1.6E-23 21653640 Rheumatoid arthritis rs9272219 1.0E-45

HLA-DQA1 7.01 0.69 1.6E-23 21408207 Systemic lupus erythematosus rs2187668 6.0E-28

HLA-DQA1 7.01 0.69 1.6E-23 18204098 Systemic lupus erythematosus rs2187668 3.0E-21

HLA-DQA1 7.01 0.69 1.6E-23 21779181 Systemic sclerosis rs3129763 1.0E-11

HLA-DQA1 7.01 0.69 1.6E-23 17554300 Type 1 diabetes rs9272346 5.0E-134

HLA-DQA1 7.01 0.69 1.6E-23 18978792 Type 1 diabetes rs9272346 6.0E-129

HLA-DQB1 7.35 0.34 4.8E-05 20860503 Asthma rs9273349 7.0E-14

HLA-DQB1 7.35 0.34 4.8E-05 21570397 Drug-induced liver injury (amoxicillin-clavulanate)

rs9274407 5.0E-14

HLA-DRA 8.07 0.45 6.1E-06 21804548 Asthma rs3129890 5.0E-13

HLA-DRA 8.07 0.45 6.1E-06 20686565 Cholesterol, total rs3177928 4.0E-19

HLA-DRA 8.07 0.45 6.1E-06 21764829 Hepatitis B vaccine response rs3135363 7.0E-22

HLA-DRA 8.07 0.45 6.1E-06 21037568 Hodgkin's lymphoma rs6903608 3.0E-50

HLA-DRA 8.07 0.45 6.1E-06 20686565 LDL cholesterol rs3177928 2.0E-15

HLA-DRA 8.07 0.45 6.1E-06 19525953 Multiple sclerosis rs3135388 4.0E-225

HLA-DRA 8.07 0.45 6.1E-06 22190364 Multiple sclerosis rs3129889 1.0E-206

HLA-DRA 8.07 0.45 6.1E-06 17660530 Multiple sclerosis rs3135388 9.0E-81

HLA-DRA 8.07 0.45 6.1E-06 20159113 Multiple sclerosis rs3135338 2.0E-25

HLA-DRA 8.07 0.45 6.1E-06 22086417 Nodular sclerosis Hodgkin lymphoma

rs6903608 8.0E-18

HLA-DRA 8.07 0.45 6.1E-06 22451204 Parkinson's disease rs2395163 3.0E-11

HLA-DRA 8.07 0.45 6.1E-06 20711177 Parkinson's disease rs3129882 2.0E-10

Nature Genetics: doi:10.1038/ng.2951

Page 24: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

24

HLA-DRA 8.07 0.45 6.1E-06 19458352 Primary biliary cirrhosis rs3135363 1.0E-10

HLA-DRA 8.07 0.45 6.1E-06 19458352 Primary biliary cirrhosis rs3135363 7.0E-10

HLA-DRA 8.07 0.45 6.1E-06 21653640 Rheumatoid arthritis rs9268853 5.0E-109

HLA-DRA 8.07 0.45 6.1E-06 21779181 Systemic sclerosis rs3129882 2.0E-27

HLA-DRA 8.07 0.45 6.1E-06 19430480 Type 1 diabetes rs9268645 1.0E-100

HLA-DRA 8.07 0.45 6.1E-06 21297633 Ulcerative colitis rs9268853 1.0E-55

HLA-DRA 8.07 0.45 6.1E-06 19915572 Ulcerative colitis rs9268877 4.0E-23

HLA-DRA 8.07 0.45 6.1E-06 18836448 Ulcerative colitis rs9268877 6.0E-18

HLA-F 10.36 0.53 6.0E-09 19525953 Multiple sclerosis rs2523393 1.0E-17

HLA-G 5.42 0.35 6.3E-04 21653640 Rheumatoid arthritis rs1610677 4.0E-15

HP 4.30 0.53 1.0E-07 20686565 Cholesterol, total rs2000999 3.0E-24

HP 4.30 0.53 1.0E-07 22403646 Haptoglobin levels rs2000999 8.0E-59

HP 4.30 0.53 1.0E-07 20686565 LDL cholesterol rs2000999 2.0E-22

HSPA6 8.53 0.44 5.5E-07 22081228 Kawasaki disease rs1801274 7.0E-11

HSPA6 8.53 0.44 5.5E-07 21297633 Ulcerative colitis rs1801274 2.0E-20

HSPA6 8.53 0.44 5.5E-07 19915573 Ulcerative colitis rs1801274 2.0E-12

ICAM3 10.45 0.52 3.8E-07 21102463 Crohn's disease rs12720356 1.0E-12

ICAM3 10.45 0.52 3.8E-07 20953190 Psoriasis rs12720356 4.0E-11

IFI30 11.28 0.43 1.5E-05 21833088 Multiple sclerosis rs874628 1.0E-08

IGF2BP2 5.29 0.37 2.0E-04 20881960 Height rs720390 2.0E-14

IGF2BP2 5.29 0.37 2.0E-04 17463248 Type 2 diabetes rs4402960 9.0E-16

IGF2BP2 5.29 0.37 2.0E-04 17463249 Type 2 diabetes rs4402960 9.0E-16

IGF2BP2 5.29 0.37 2.0E-04 18711366 Type 2 diabetes rs6769511 1.0E-09

IGF2BP2 5.29 0.37 2.0E-04 17463246 Type 2 diabetes rs4402960 2.0E-09

IGF2BP2 5.29 0.37 2.0E-04 20581827 Type 2 diabetes rs1470579 2.0E-09

IL18RAP 5.92 0.39 3.3E-05 20190752 Celiac disease rs917997 1.0E-15

IL18RAP 5.92 0.39 3.3E-05 21102463 Crohn's disease rs2058660 2.0E-12

IL1R2 6.75 0.51 7.3E-07 21297633 Ulcerative colitis rs2310173 3.0E-12

IL2RB 6.59 0.30 8.8E-04 20860503 Asthma rs2284033 1.0E-08

Nature Genetics: doi:10.1038/ng.2951

Page 25: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

25

IL7R 7.81 0.38 1.4E-04 21297633 Ulcerative colitis rs3194051 4.0E-08

IRF1 7.15 0.33 9.5E-04 21300955 C-reactive protein rs4705952 1.0E-08

IRF1 7.15 0.33 9.5E-04 20031576 Fibrinogen rs2522056 1.0E-15

IRF1 7.15 0.33 9.5E-04 22139419 Platelet counts rs2070729 1.0E-10

IRF5 5.05 0.40 1.5E-05 20639880 Primary biliary cirrhosis rs10488631 3.0E-10

IRF5 5.05 0.40 1.5E-05 20453842 Rheumatoid arthritis rs10488631 4.0E-11

IRF5 5.05 0.40 1.5E-05 19838193 Systemic lupus erythematosus rs4728142 8.0E-19

IRF5 5.05 0.40 1.5E-05 21408207 Systemic lupus erythematosus rs10488631 7.0E-18

IRF5 5.05 0.40 1.5E-05 18204098 Systemic lupus erythematosus rs10488631 2.0E-11

IRF5 5.05 0.40 1.5E-05 20383147 Systemic sclerosis rs10488631 2.0E-13

IRF5 5.05 0.40 1.5E-05 21779181 Systemic sclerosis rs10488631 2.0E-10

IRF5 5.05 0.40 1.5E-05 21779181 Systemic sclerosis rs10488631 1.0E-09

IRF5 5.05 0.40 1.5E-05 21297633 Ulcerative colitis rs4728142 2.0E-08

IRF8 6.31 0.36 3.9E-04 21131588 Chronic lymphocytic leukemia rs391525 3.0E-09

ITGA6 7.39 0.40 8.9E-05 19767753 Prostate cancer rs12621278 9.0E-23

ITPK1 5.94 0.37 2.6E-04 22139419 Platelet counts rs8006385 1.0E-10

KCNJ2 6.00 0.37 3.3E-04 20195514 Primary tooth dev (number) rs8079702 1.0E-14

KCNJ2 6.00 0.37 3.3E-04 20195514 Primary tooth dev (time) rs8079702 4.0E-22

KIAA1109 2.94 0.38 2.7E-04 20190752 Celiac disease rs13151961 2.0E-27

KIAA1109 2.94 0.38 2.7E-04 19430480 Type 1 diabetes rs4505848 5.0E-13

KIAA1267 9.81 0.33 1.7E-04 22504418 Intracranial volume rs9303525 8.0E-15

KLF1 6.55 0.32 7.1E-04 19862010 Mean corpuscular hemoglobin rs11085824 1.0E-11

KREMEN1 5.16 0.35 1.6E-04 20935629 Waist-hip ratio rs4823006 3.0E-11

LDLR 5.68 0.35 5.6E-04 21943158 Cardiovascular risk factors rs6511720 5.0E-11

LDLR 5.68 0.35 5.6E-04 20686565 Cholesterol, total rs6511720 7.0E-97

LDLR 5.68 0.35 5.6E-04 19060911 Cholesterol, total rs2228671 9.0E-24

LDLR 5.68 0.35 5.6E-04 20686565 LDL cholesterol rs6511720 4.0E-117

LDLR 5.68 0.35 5.6E-04 18193044 LDL cholesterol rs6511720 2.0E-51

LDLR 5.68 0.35 5.6E-04 19060906 LDL cholesterol rs6511720 2.0E-26

Nature Genetics: doi:10.1038/ng.2951

Page 26: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

26

LDLR 5.68 0.35 5.6E-04 18193043 LDL cholesterol rs6511720 4.0E-26

LDLR 5.68 0.35 5.6E-04 19060911 LDL cholesterol rs2228671 4.0E-14

LDLR 5.68 0.35 5.6E-04 22286219 Lipid metabolism phenotypes rs55791371 8.0E-17

LDLR 5.68 0.35 5.6E-04 22003152 Lipoprotein-associated phospholipase A2 activity / mass

rs6511720 3.0E-11

LDLR 5.68 0.35 5.6E-04 21909109 Metabolite levels rs2738446 2.0E-12

LILRA3 5.28 0.74 1.3E-18 20686565 HDL cholesterol rs386000 4.0E-16

LIPA 4.92 0.35 3.3E-04 21378988 Coronary heart disease rs1412444 3.0E-13

LIPA 4.92 0.35 3.3E-04 21606135 Coronary heart disease rs1412444 4.0E-08

LPL 2.38 0.36 4.0E-04 20686565 HDL cholesterol rs12678919 1.0E-97

LPL 2.38 0.36 4.0E-04 19060906 HDL cholesterol rs12678919 2.0E-34

LPL 2.38 0.36 4.0E-04 20864672 HDL cholesterol rs325 8.0E-26

LPL 2.38 0.36 4.0E-04 18193044 HDL cholesterol rs328 9.0E-23

LPL 2.38 0.36 4.0E-04 20031538 HDL cholesterol rs17482753 3.0E-11

LPL 2.38 0.36 4.0E-04 21386085 HDL Cholesterol - Triglycerides rs13702 1.0E-16

LPL 2.38 0.36 4.0E-04 22399527 Metabolic syndrome rs268 2.0E-12

LPL 2.38 0.36 4.0E-04 21386085 Metabolic syndrome rs295 2.0E-09

LPL 2.38 0.36 4.0E-04 21386085 Metabolic syndrome rs301 3.0E-11

LPL 2.38 0.36 4.0E-04 21386085 Metabolic syndrome rs2197089 2.0E-09

LPL 2.38 0.36 4.0E-04 20686565 Triglycerides rs12678919 2.0E-115

LPL 2.38 0.36 4.0E-04 19060906 Triglycerides rs12678919 2.0E-41

LPL 2.38 0.36 4.0E-04 18193044 Triglycerides rs328 2.0E-28

LPL 2.38 0.36 4.0E-04 20864672 Triglycerides rs10105606 4.0E-26

LPL 2.38 0.36 4.0E-04 19060911 Triglycerides rs10096633 2.0E-18

LPL 2.38 0.36 4.0E-04 18193046 Triglycerides rs326 5.0E-12

LPL 2.38 0.36 4.0E-04 22171074 Triglycerides rs328 1.0E-09

LPL 2.38 0.36 4.0E-04 21386085 Triglycerides-Blood Pressure rs15285 1.0E-10

LRRK2 7.40 0.50 2.4E-07 18587394 Crohn's disease rs11175593 3.0E-10

LRRK2 7.40 0.50 2.4E-07 21738487 Parkinson's disease rs34637584 2.0E-28

LRRK2 7.40 0.50 2.4E-07 22438815 Parkinson's disease rs34778348 3.0E-21

Nature Genetics: doi:10.1038/ng.2951

Page 27: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

27

LRRK2 7.40 0.50 2.4E-07 22438815 Parkinson's disease rs1491942 6.0E-15

LRRK2 7.40 0.50 2.4E-07 21292315 Parkinson's disease rs1491942 6.0E-14

LSP1 10.41 0.36 1.3E-04 17529967 Breast cancer rs3817198 3.0E-09

LSP1 10.41 0.36 1.3E-04 21297633 Ulcerative colitis rs907611 1.0E-10

LST1 10.42 0.34 7.3E-04 21946350 Pulmonary function rs2857595 2.0E-10

LST1 10.42 0.34 7.3E-04 19079260 Weight rs2844479 2.0E-08

LTBP1 5.30 0.42 1.1E-05 20881960 Height rs6714546 2.0E-09

2-Mar 5.67 0.46 6.0E-06 19060906 HDL cholesterol rs2967605 1.0E-08

NCOA4 8.35 0.47 3.0E-06 20139978 Hematological & biochem traits rs7085433 6.0E-10

NCOA4 8.35 0.47 3.0E-06 18264097 Prostate cancer rs10993994 9.0E-29

NCOA4 8.35 0.47 3.0E-06 18264096 Prostate cancer rs10993994 7.0E-13

NCOA4 8.35 0.47 3.0E-06 20676098 Prostate cancer rs10993994 3.0E-08

NCOA4 8.35 0.47 3.0E-06 21160077 Prostate-specific antigen levels rs10993994 7.0E-13

NINJ2 6.48 0.39 1.2E-04 19369658 Stroke rs12425791 1.0E-09

NKX3-1 3.88 0.63 6.3E-11 19767753 Prostate cancer rs1512268 3.0E-30

NKX3-1 3.88 0.63 6.3E-11 20676098 Prostate cancer rs1512268 4.0E-11

NOD2 4.49 0.42 4.3E-05 21102463 Crohn's disease rs2076756 4.0E-69

NOD2 4.49 0.42 4.3E-05 22412388 Crohn's disease rs2076756 1.0E-37

NOD2 4.49 0.42 4.3E-05 17684544 Crohn's disease rs2076756 1.0E-21

NOD2 4.49 0.42 4.3E-05 17804789 Crohn's disease rs5743289 6.0E-17

NOD2 4.49 0.42 4.3E-05 17435756 Crohn's disease rs2076756 7.0E-14

NOD2 4.49 0.42 4.3E-05 17554300 Crohn's disease rs17221417 4.0E-11

NOD2 4.49 0.42 4.3E-05 18758464 Inflammatory bowel disease rs5743289 4.0E-10

NOD2 4.49 0.42 4.3E-05 17068223 Inflammatory bowel disease rs2076756 5.0E-10

NOD2 4.49 0.42 4.3E-05 20018961 Leprosy rs9302752 4.0E-40

NOV 6.34 0.39 7.3E-05 21909110 Blood pressure rs2071518 4.0E-09

NPR3 2.49 0.40 1.2E-04 21572416 Blood pressure rs1173766 2.0E-08

OAS1 5.99 0.48 2.4E-06 21909109 Gamma glutamyl transpeptidase rs11066453 6.0E-44

OAS2 5.77 0.42 3.6E-05 21270382 Alcohol consumption rs2072134 6.0E-17

Nature Genetics: doi:10.1038/ng.2951

Page 28: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

28

OAS3 8.27 0.52 2.1E-08 21270382 Alcohol consumption rs2072134 6.0E-17

OAS3 8.27 0.52 2.1E-08 21909109 Gamma glutamyl transpeptidase rs11066453 6.0E-44

OPTN 7.49 0.37 1.6E-04 21623375 Paget's disease rs1561570 4.0E-38

OPTN 7.49 0.37 1.6E-04 20436471 Paget's disease rs1561570 6.0E-13

OR2W3 6.44 0.40 1.7E-05 20139978 Hematological & biochem traits rs11204538 2.0E-08

PA2G4 10.06 0.34 5.8E-04 19430480 Type 1 diabetes rs2292239 2.0E-25

PA2G4 10.06 0.34 5.8E-04 17554260 Type 1 diabetes rs2292239 2.0E-20

PA2G4 10.06 0.34 5.8E-04 18978792 Type 1 diabetes rs2292239 3.0E-16

PA2G4 10.06 0.34 5.8E-04 21829393 Type 1 diabetes autoantibodies rs2292239 3.0E-27

PADI4 6.50 0.43 1.1E-05 21452313 Rheumatoid arthritis rs2240335 2.0E-08

PADI4 6.50 0.43 1.1E-05 21505073 Rheumatoid arthritis rs2240335 2.0E-08

PLA2G7 4.64 0.40 4.9E-05 22003152 Lipoprotein-associated phospholipase A2 activity/mass

rs1805017 2.0E-23

PLA2G7 4.64 0.40 4.9E-05 20442857 Lipoprotein-associated phospholipase A2 activity/mass

rs1805017 6.0E-14

PLA2G7 4.64 0.40 4.9E-05 22003152 Lipoprotein-associated phospholipase A2 activity/mass

rs7756935 1.0E-10

PLBD1 6.74 0.38 5.9E-05 20543847 Testicular germ cell cancer rs2900333 6.0E-10

PLEK 7.95 0.58 1.0E-13 20190752 Celiac disease rs17035378 8.0E-09

PRDX5 9.21 0.56 1.7E-10 21102463 Crohn's disease rs694739 6.0E-10

PSMB9 6.49 0.41 3.2E-05 21399633 Nephropathy rs9357155 2.0E-12

PTPRC 8.37 0.42 4.6E-05 20139978 Hematological & biochem traits rs12127588 7.0E-10

PVRL2 3.09 0.49 2.0E-07 20460622 Alzheimer's disease rs2075650 1.0E-295

PVRL2 3.09 0.49 2.0E-07 19734902 Alzheimer's disease rs2075650 2.0E-157

PVRL2 3.09 0.49 2.0E-07 21627779 Alzheimer's disease rs157580 8.0E-89

PVRL2 3.09 0.49 2.0E-07 19125160 Alzheimer's disease rs157580 1.0E-40

PVRL2 3.09 0.49 2.0E-07 19734903 Alzheimer's disease rs2075650 2.0E-16

PVRL2 3.09 0.49 2.0E-07 18823527 Alzheimer's disease rs6859 6.0E-14

PVRL2 3.09 0.49 2.0E-07 20061627 Alzheimer's disease rs2075650 3.0E-11

PVRL2 3.09 0.49 2.0E-07 22005931 Alzheimer's disease (onset) rs6857 2.0E-10

PVRL2 3.09 0.49 2.0E-07 20885792 Alzheimer's disease (late onset) rs2075650 5.0E-36

Nature Genetics: doi:10.1038/ng.2951

Page 29: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

29

PVRL2 3.09 0.49 2.0E-07 18439548 C-reactive protein rs769449 9.0E-21

PVRL2 3.09 0.49 2.0E-07 21943158 Cardiovascular risk factors rs2075650 2.0E-14

PVRL2 3.09 0.49 2.0E-07 21943158 Cardiovascular risk factors rs2075650 4.0E-08

PVRL2 3.09 0.49 2.0E-07 19060911 Cholesterol, total rs2075650 3.0E-19

PVRL2 3.09 0.49 2.0E-07 21909109 HDL cholesterol rs519113 8.0E-11

PVRL2 3.09 0.49 2.0E-07 19060911 LDL cholesterol rs157580 2.0E-19

PVRL2 3.09 0.49 2.0E-07 22286219 Lipid metabolism phenotypes rs7412 3.0E-58

PVRL2 3.09 0.49 2.0E-07 22399527 Metabolic syndrome rs157582 1.0E-08

PVRL2 3.09 0.49 2.0E-07 22331829 Response to statin therapy rs7412 2.0E-47

RAB37 7.47 0.35 1.2E-04 20031577 Fibrinogen rs10512597 8.0E-11

RHCE 1.88 0.37 3.1E-04 21700265 Erythrocyte sedimentation rate rs3091242 2.0E-13

RHD 4.15 0.58 2.8E-10 21700265 Erythrocyte sedimentation rate rs3091242 2.0E-13

RNASET2 8.38 0.65 5.2E-16 21841780 Graves' disease rs9355610 7.0E-10

RNASET2 8.38 0.65 5.2E-16 20526339 Vitiligo rs2236313 1.0E-16

SELP 5.45 0.44 3.0E-06 20167578 Soluble adhesion molecules rs6136 4.0E-61

SELP 5.45 0.44 3.0E-06 20167578 Soluble adhesion molecules rs2235302 4.0E-16

SIDT2 6.15 0.53 4.4E-08 21943158 Cardiovascular risk factors rs508487 2.0E-10

SIDT2 6.15 0.53 4.4E-08 18464913 Protein quantitative trait loci rs7112513 6.0E-09

SIRPB1 4.18 0.81 5.1E-27 19430480 Type 1 diabetes rs2281808 1.0E-11

SKAP2 8.00 0.51 1.2E-08 19430480 Type 1 diabetes rs7804356 5.0E-09

SLC14A1 6.15 0.37 5.7E-05 21750109 Bladder cancer rs17674580 8.0E-11

SLC14A1 6.15 0.37 5.7E-05 21824976 Bladder cancer rs7238033 9.0E-09

SNCA 9.61 0.32 6.3E-04 22438815 Parkinson's disease rs356219 6.0E-65

SNCA 9.61 0.32 6.3E-04 21292315 Parkinson's disease rs356219 2.0E-47

SNCA 9.61 0.32 6.3E-04 22451204 Parkinson's disease rs356220 8.0E-35

SNCA 9.61 0.32 6.3E-04 21738487 Parkinson's disease rs356220 2.0E-19

SNCA 9.61 0.32 6.3E-04 19915576 Parkinson's disease rs11931074 7.0E-17

SNCA 9.61 0.32 6.3E-04 19915575 Parkinson's disease rs2736990 2.0E-16

SNCA 9.61 0.32 6.3E-04 21044948 Parkinson's disease rs356220 9.0E-16

Nature Genetics: doi:10.1038/ng.2951

Page 30: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

30

SNCA 9.61 0.32 6.3E-04 20711177 Parkinson's disease rs356220 3.0E-11

SNCA 9.61 0.32 6.3E-04 21084426 Parkinson's disease rs356220 3.0E-08

SPINT2 7.51 0.47 9.0E-06 19767754 Prostate cancer rs8102476 2.0E-11

STAT6 8.47 0.87 1.8E-28 22075330 IgE levels rs1059513 2.0E-12

SYCP2L 2.30 0.36 3.1E-04 22267201 Menopause (age at onset) rs2153157 8.0E-12

SYCP2L 2.30 0.36 3.1E-04 21829377 Phospholipid levels (plasma) rs3734398 1.0E-43

SYCP2L 2.30 0.36 3.1E-04 21829377 Phospholipid levels (plasma) rs4713103 3.0E-36

SYCP2L 2.30 0.36 3.1E-04 21829377 Phospholipid levels (plasma) rs4713103 8.0E-14

SYCP2L 2.30 0.36 3.1E-04 22359512 Phospholipid levels (plasma) rs17606561 1.0E-11

SYCP2L 2.30 0.36 3.1E-04 21829377 Phospholipid levels (plasma) rs6918936 3.0E-08

TAP2 9.03 0.73 7.4E-26 21399633 Nephropathy rs9357155 2.0E-12

TKT 8.36 0.42 2.2E-05 21076409 Ventricular conduction rs4687718 6.0E-09

TMEM50A 8.27 0.41 3.1E-05 21700265 Erythrocyte sedimentation rate rs3091242 2.0E-13

TNFRSF13B 6.56 0.37 1.5E-04 20139978 Hematological & biochem traits rs4273077 3.0E-10

TNFRSF13B 6.56 0.37 1.5E-04 22558069 Non-albumin protein levels rs4985726 7.0E-24

TNFRSF13B 6.56 0.37 1.5E-04 22558069 Non-albumin protein levels rs4985726 1.0E-14

TPM1 6.92 0.38 8.3E-05 19820697 Mean platelet volume rs11071720 2.0E-08

TRIM58 7.58 0.43 1.6E-05 20139978 Hematological & biochem traits rs11204538 2.0E-08

TUBB1 7.32 0.60 5.6E-12 22423221 Mean platelet volume rs151361 9.0E-09

UST 2.44 0.38 2.9E-04 20360315 Response to antidepressants rs2500535 4.0E-08

VSIG2 5.23 0.39 7.3E-05 19571808 Schizophrenia rs12807809 2.0E-09

WDR1 7.37 0.35 2.7E-04 21943158 Cardiovascular risk factors rs7671266 9.0E-71

WLS 6.67 0.72 3.3E-17 19801982 Bone mineral density (hip) rs2566755 2.0E-12

WLS 6.67 0.72 3.3E-17 19801982 Bone mineral density (spine) rs1430742 3.0E-13

The first four columns indicate genes with significant heritability (standard gene name, mean expression level, a2

estimate of heritability, and the heritability P value). The remaining four columns describe a GWAS association from the NHGRI catalog.

3: the PubMed ID of the publication, the disease/trait studied, the SNP ID, and the

GWAS association p-value. Some disease-SNP associations are present more than once, usually because they were reported in several papers.

Nature Genetics: doi:10.1038/ng.2951

Page 31: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

31

Supplementary Table 5: Genes with significant heritability also found in OMIM.

Gene Mean

Expr

a2 P.a

2 OMIM disease

A2M 5.11 0.50 3.6E-07 Alpha-2-macroglobulin deficiency

A2M 5.11 0.50 3.6E-07 Alzheimer disease, susceptibility to

ACTG1 11.30 0.41 4.0E-05 Baraitser-Winter syndrome 2

ACTG1 11.30 0.41 4.0E-05 Deafness, autosomal dominant 20/26

ADAM17 7.15 0.52 6.6E-08 Inflammatory skin and bowel disease, neonatal

ADAMTSL4 7.41 0.39 1.8E-04 Ectopia lentis, isolated, autosomal recessive

ADCK3 7.25 0.46 3.0E-06 Coenzyme Q10 deficiency, primary, 4

ADRB3 2.55 0.36 4.7E-04 Obesity, susceptibility to

AGA 4.32 0.34 7.9E-04 Aspartylglucosaminuria

AHI1 4.41 0.36 1.5E-04 Joubert syndrome-3

ALAS2 10.66 0.54 3.5E-09 Anemia, sideroblastic, X-linked

ALAS2 10.66 0.54 3.5E-09 Protoporphyria, erythropoietic, X-linked

ALOX5 5.57 0.39 2.0E-04 Asthma, diminished response to antileukotriene treatment in

ALOX5 5.57 0.39 2.0E-04 Atherosclerosis, susceptibility to

ALOX5AP 9.09 0.48 3.7E-07 Stroke, susceptibility to

ALPL 7.04 0.41 1.5E-05 Hypophosphatasia, adult

ALPL 7.04 0.41 1.5E-05 Hypophosphatasia, childhood

ALPL 7.04 0.41 1.5E-05 Hypophosphatasia, infantile

ALPL 7.04 0.41 1.5E-05 Odontohypophosphatasia

ANK1 5.78 0.36 2.3E-04 Chondrocalcinosis 2

ANK1 5.78 0.36 2.3E-04 Craniometaphyseal dysplasia

ANK1 5.78 0.36 2.3E-04 Spherocytosis, type 1

ANXA5 8.70 0.53 4.4E-09 Pregnancy loss, recurrent, susceptibility to, 3

APP 6.82 0.35 4.0E-04 Alzheimer disease 1, familial

APP 6.82 0.35 4.0E-04 Cerebral amyloid angiopathy, Dutch, Italian, Iowa, Flemish, Arctic variants

AQP3 9.55 0.32 6.3E-04 Blood group GIL

ASAH1 7.58 0.38 1.2E-04 Farber lipogranulomatosis

Nature Genetics: doi:10.1038/ng.2951

Page 32: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

32

ASAH1 7.58 0.38 1.2E-04 Spinal muscular atrophy with progressive myoclonic epilepsy

ATP6V1A 5.33 0.49 4.8E-07 Renal tubular acidosis, distal, autosomal recessive

BCL6 8.16 0.39 8.0E-05 Lymphoma, B-cell

BCOR 3.75 0.36 4.7E-04 Microphthalmia, syndromic 2

BPGM 5.62 0.34 4.2E-04 Erythrocytosis due to bisphosphoglycerate mutase deficiency

BSG 7.73 0.32 3.5E-04 Blood group, OK

C16orf57 7.18 0.40 1.0E-05 Poikiloderma with neutropenia

CAT 7.12 0.38 7.6E-05 Acatalasemia

CCR5 4.31 0.43 1.8E-05 Diabetes mellitus, insulin-dependent, 22

CCR5 4.31 0.43 1.8E-05 Hepatitis C virus, resistance to

CCR5 4.31 0.43 1.8E-05 HIV infection, susceptibility/resistance to

CCR5 4.31 0.43 1.8E-05 West nile virus, susceptibility to

CD151 5.22 0.44 9.3E-07 Blood group, Raph

CD151 5.22 0.44 9.3E-07 Nephropathy with pretibial epidermolysis bullosa and deafness

CD36 6.46 0.35 2.4E-04 Bleeding disorder, platelet-type, 11

CD36 6.46 0.35 2.4E-04 Coronary heart disease, susceptibility to, 7

CD36 6.46 0.35 2.4E-04 Macrothrombocytopenia

CD36 6.46 0.35 2.4E-04 Malaria, cerebral, reduced risk of

CD36 6.46 0.35 2.4E-04 Malaria, cerebral, susceptibility to

CD36 6.46 0.35 2.4E-04 Platelet glycoprotein IV deficiency

CD55 8.83 0.56 3.5E-10 Blood group Cromer

CD79A 5.49 0.33 2.7E-04 Agammaglobulinemia 3

CD8A 7.05 0.34 1.6E-04 CD8 deficiency, familial

CDK5RAP2 6.60 0.40 5.1E-05 Microcephaly, primary autosomal recessive, 3

CFD 3.63 0.55 3.7E-12 Complement factor D deficiency

CFD 3.63 0.55 3.7E-12 Properdin deficiency, X-linked

CHI3L1 7.11 0.72 1.5E-23 Asthma-related traits, susceptibility to, 7

CHI3L1 7.11 0.72 1.5E-23 Schizophrenia, susceptibility to

CLC 6.82 0.51 8.7E-08 Cold-induced sweating syndrome 1

Nature Genetics: doi:10.1038/ng.2951

Page 33: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

33

CLEC7A 6.85 0.51 1.5E-07 Aspergillosis, susceptibility to

CLEC7A 6.85 0.51 1.5E-07 Candidiasis, familial, 4, autosomal dominant

CNTNAP2 3.81 0.38 1.2E-04 Autism susceptibility 15

CNTNAP2 3.81 0.38 1.2E-04 Cortical dysplasia-focal epilepsy syndrome

CNTNAP2 3.81 0.38 1.2E-04 Pitt-Hopkins like syndrome 1

COL18A1 5.12 0.41 3.5E-05 Knobloch syndrome, type 1

COL9A3 3.91 0.43 1.7E-06 Epiphyseal dysplasia, multiple, 3

COL9A3 3.91 0.43 1.7E-06 Epiphyseal dysplasia, multiple, with myopathy

COL9A3 3.91 0.43 1.7E-06 Intervertebral disc disease, susceptibility to

CRTAP 6.20 0.43 1.9E-05 Osteogenesis imperfecta, type VII

CSF1R 6.90 0.37 1.4E-04 Leukoencephalopathy, diffuse hereditary, with spheroids

CSF2RB 8.95 0.35 5.4E-04 Surfactant metabolism dysfunction, pulmonary, 5

CSTB 8.11 0.39 1.0E-04 Epilepsy, progressive myoclonic 1A (Unverricht and Lundborg)

CTDSP2 7.20 0.35 6.9E-04 Osteoarthritis susceptibility 4

CTDSPL 4.08 0.38 1.8E-04 Pregnancy loss, susceptibility to

CTDSPL 4.08 0.38 1.8E-04 Spermatogenic failure 4

CXCR4 10.59 0.43 2.3E-05 Myelokathexis, isolated

CXCR4 10.59 0.43 2.3E-05 WHIM syndrome

CYBA 5.69 0.39 1.8E-04 Chronic granulomatous disease, autosomal, due to deficiency of CYBA

CYBB 8.58 0.48 8.9E-07 Atypical mycobacteriosis, familial, X-linked 2

CYBB 8.58 0.48 8.9E-07 Chronic granulomatous disease, X-linked

CYP1B1 4.99 0.40 5.5E-05 Glaucoma 3A, primary congenital

CYP1B1 4.99 0.40 5.5E-05 Glaucoma, early-onset, digenic

CYP1B1 4.99 0.40 5.5E-05 Glaucoma, primary open angle, adult-onset

CYP1B1 4.99 0.40 5.5E-05 Glaucoma, primary open angle, juvenile-onset

CYP1B1 4.99 0.40 5.5E-05 Peters anomaly

CYP27A1 6.53 0.75 4.8E-17 Cerebrotendinous xanthomatosis

DISC1 5.19 0.40 9.4E-05 Schizoaffective disorder, susceptibility to

DISC1 5.19 0.40 9.4E-05 Schizophrenia, susceptibility to

Nature Genetics: doi:10.1038/ng.2951

Page 34: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

34

DSC2 3.47 0.51 5.0E-08 Arrhythmogenic right ventricular dysplasia 11

DSC2 3.47 0.51 5.0E-08 Arrhythmogenic right ventricular dysplasia 11 with mild palmoplantar keratoderma and woolly hair

DSP 2.34 0.44 9.5E-06 Arrhythmogenic right ventricular dysplasia 8

DSP 2.34 0.44 9.5E-06 Dilated cardiomyopathy with woolly hair and keratoderma

DSP 2.34 0.44 9.5E-06 Epidermolysis bullosa, lethal acantholytic

DSP 2.34 0.44 9.5E-06 Keratosis palmoplantaris striata II

DSP 2.34 0.44 9.5E-06 Skin fragility-woolly hair syndrome

DYSF 7.40 0.33 7.0E-04 Miyoshi muscular dystrophy 1

DYSF 7.40 0.33 7.0E-04 Muscular dystrophy, limb-girdle, type 2B

DYSF 7.40 0.33 7.0E-04 Myopathy, distal, with anterior tibial onset

EPB41 8.28 0.45 9.2E-06 Elliptocytosis-1

F13A1 6.07 0.56 1.8E-10 Factor XIIIA deficiency

F13A1 6.07 0.56 1.8E-10 Myocardial infarction, protection against

F13A1 6.07 0.56 1.8E-10 Venous thrombosis, protection against

F5 5.26 0.45 6.7E-06 Budd-Chiari syndrome

F5 5.26 0.45 6.7E-06 Factor V deficiency

F5 5.26 0.45 6.7E-06 Pregnancy loss, recurrent, susceptibility to, 1

F5 5.26 0.45 6.7E-06 Stroke, ischemic, susceptibility to

F5 5.26 0.45 6.7E-06 Thrombophilia due to activated protein C resistance

F5 5.26 0.45 6.7E-06 Thrombophilia, susceptibility to, due to factor V Leiden

FADD 7.54 0.57 1.2E-08 Infections, recurrent, with encephalopathy, hepatic dysfunction, and cardiovasuclar malforamtions

FBXO7 8.72 0.35 2.5E-04 Parkinson disease 15, autosomal recessive

FCGR2A 10.32 0.33 5.1E-04 Lupus nephritis, susceptibility to

FCGR2B 2.83 0.35 3.9E-04 Malaria, resistance to

FCGR2B 2.83 0.35 3.9E-04 Systemic lupus erythematosus, susceptibility to

FCGR3A 7.52 0.51 3.0E-08 Neutropenia, alloimmune neonatal

FCGR3A 7.52 0.51 3.0E-08 Viral infections, recurrent

FECH 5.03 0.33 9.2E-04 Protoporphyria, erythropoietic, autosomal recessive

Nature Genetics: doi:10.1038/ng.2951

Page 35: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

35

FHL1 4.46 0.40 1.1E-04 Emery-Dreifuss muscular dystrophy 6, X-linked

FHL1 4.46 0.40 1.1E-04 Hemophagocytic lymphohistiocytosis, familial, 1

FHL1 4.46 0.40 1.1E-04 Myopathy, reducing body, X-linked, childhood-onset

FHL1 4.46 0.40 1.1E-04 Myopathy, reducing body, X-linked, severe early-onset

FHL1 4.46 0.40 1.1E-04 Myopathy, X-linked, with postural muscle atrophy

FHL1 4.46 0.40 1.1E-04 Scapuloperoneal myopathy, X-linked dominant

FHL3 4.49 0.43 3.2E-05 Hemophagocytic lymphohistiocytosis, familial, 3

FLCN 5.44 0.46 7.9E-07 Birt-Hogg-Dube syndrome

FLCN 5.44 0.46 7.9E-07 Colorectal cancer, somatic

FLCN 5.44 0.46 7.9E-07 Hip dysplasia, Beukes type

FLCN 5.44 0.46 7.9E-07 Pneumothorax, primary spontaneous

FLCN 5.44 0.46 7.9E-07 Renal carcinoma, chromophobe, somatic

GATA2 7.02 0.58 1.5E-10 Dendritic cell, monocyte, B lymphocyte, and natural killer lymphocyte deficiency

GATA2 7.02 0.58 1.5E-10 Emberger syndrome

GATA2 7.02 0.58 1.5E-10 Leukemia, acute myeloid, susceptibility to

GATA2 7.02 0.58 1.5E-10 Myelodysplastic syndrome, susceptibility to

GLUL 9.22 0.34 7.5E-04 Glutamine deficiency, congenital

GM2A 5.75 0.56 2.1E-08 GM2-gangliosidosis, AB variant

GNAS 10.77 0.35 4.5E-04 Acromegaly

GNAS 10.77 0.35 4.5E-04 ACTH-independent macronodular adrenal hyperplasia

GNAS 10.77 0.35 4.5E-04 McCune-Albright syndrome

GNAS 10.77 0.35 4.5E-04 Osseous heteroplasia, progressive

GNAS 10.77 0.35 4.5E-04 Prolonged bleeding time, brachydactyly and mental retardation

GNAS 10.77 0.35 4.5E-04 Prolonged bleeding time, brachydactyly, and mental retardation

GNAS 10.77 0.35 4.5E-04 Pseudohypoparathyroidism Ia

GNAS 10.77 0.35 4.5E-04 Pseudohypoparathyroidism Ib

GNAS 10.77 0.35 4.5E-04 Pseudohypoparathyroidism Ic

GNAS 10.77 0.35 4.5E-04 Pseudopseudohypoparathyroidism

GNPTAB 7.39 0.34 9.1E-04 Mucolipidosis II alpha/beta

Nature Genetics: doi:10.1038/ng.2951

Page 36: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

36

GNPTAB 7.39 0.34 9.1E-04 Mucolipidosis III alpha/beta

GNS 7.73 0.44 4.7E-06 Mucopolysaccharidosis type IIID

GP1BA 6.06 0.46 1.2E-06 Bernard-Soulier syndrome, type A1 (recessive)

GP1BA 6.06 0.46 1.2E-06 Bernard-Soulier syndrome, type A2 (dominant)

GP1BA 6.06 0.46 1.2E-06 Nonarteritic anterior ischemic optic neuropathy, susceptibility to

GP1BA 6.06 0.46 1.2E-06 von Willebrand disease, platelet-type

GP1BB 6.39 0.42 7.9E-06 Bernard-Soulier syndrome, type B

GP1BB 6.39 0.42 7.9E-06 Giant platelet disorder, isolated

GP6 2.59 0.37 2.2E-04 Bleeding disorder, platelet-type, 11

GSN 6.11 0.40 6.9E-05 Amyloidosis, Finnish type

GYG1 6.17 0.40 9.2E-05 Glycogen storage disease XV

GYPA 5.29 0.34 8.4E-04 Blood group, MN

GYPA 5.29 0.34 8.4E-04 Blood group, Ss

GYPA 5.29 0.34 8.4E-04 Cardiac valvular dysplasia, X-linked

GYPA 5.29 0.34 8.4E-04 FG syndrome 2

GYPA 5.29 0.34 8.4E-04 Frontometaphyseal dysplasia

GYPA 5.29 0.34 8.4E-04 Heterotopia, periventricular

GYPA 5.29 0.34 8.4E-04 Heterotopia, periventricular, ED variant

GYPA 5.29 0.34 8.4E-04 Intestinal pseudoobstruction, neuronal

GYPA 5.29 0.34 8.4E-04 Malaria, resistance to

GYPA 5.29 0.34 8.4E-04 Melnick-Needles syndrome

GYPA 5.29 0.34 8.4E-04 Otopalatodigital syndrome, type I

GYPA 5.29 0.34 8.4E-04 Otopalatodigital syndrome, type II

GYPA 5.29 0.34 8.4E-04 Terminal osseous dysplasia

GYPB 5.13 0.56 1.3E-09 Blood group, Ss

HAGH 8.83 0.32 6.8E-04 Glyoxalase II deficiency

HBD 7.45 0.34 1.7E-04 Thalassemia due to Hb Lepore

HBD 7.45 0.34 1.7E-04 Thalassemia, delta-

HBG1 8.31 0.84 8.0E-39 Fetal hemoglobin quantitative trait locus 1

Nature Genetics: doi:10.1038/ng.2951

Page 37: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

37

HCCS 4.70 0.36 4.5E-04 Microphthalmia, syndromic 7

HGD 4.29 0.49 3.5E-07 Alkaptonuria

HIP1 6.26 0.54 4.9E-12 Prostate cancer, progression of

HLA-A 9.53 0.83 3.8E-34 Hypersensitivity syndrome, carbamazepine-induced, susceptibility to

HLA-B 9.96 0.62 4.0E-14 Abacavir hypersensitivity, susceptibility to

HLA-B 9.96 0.62 4.0E-14 Drug-induced liver injury due to flucloxacillin

HLA-B 9.96 0.62 4.0E-14 Spondyloarthropathy, susceptibility to, 1

HLA-B 9.96 0.62 4.0E-14 Stevens-Johnson syndrome, susceptibility to

HLA-B 9.96 0.62 4.0E-14 Synovitis, chronic, susceptibility to

HLA-B 9.96 0.62 4.0E-14 Toxic epidermal necrolysis, susceptibility to

HLA-DPB1 9.14 0.49 1.3E-06 Beryllium disease, chronic, susceptibility to

HLA-DQA1 7.01 0.69 1.6E-23 Celiac disease, susceptibility to

HLA-DQB1 7.35 0.34 4.8E-05 Celiac disease, susceptibility to

HLA-DQB1 7.35 0.34 4.8E-05 Creutzfeldt-Jakob disease, variant, resistance to

HLA-DQB1 7.35 0.34 4.8E-05 Multiple sclerosis, susceptibility to

HLA-G 5.42 0.35 6.3E-04 Asthma, susceptibility to

HP 4.30 0.53 1.0E-07 Anhaptoglobinemia

HP 4.30 0.53 1.0E-07 Hypohaptoglobinemia

IDS 5.63 0.39 2.0E-04 Mucopolysaccharidosis II

IFITM3 8.25 0.70 6.9E-17 Influenza, severe, susceptibility to

IGF2BP2 5.29 0.37 2.0E-04 Diabetes mellitus, noninsulin-dependent, susceptibility to

IL17RA 8.27 0.40 6.1E-05 Candidiasis, familial, 5, autosomal recessive

IL4R 7.20 0.32 7.8E-04 AIDS, slow progression to

IL4R 7.20 0.32 7.8E-04 Atopy, susceptibility to

IL7R 7.81 0.38 1.4E-04 Severe combined immunodeficiency, T-cell negative, B-cell/natural killer cell-positive type

IRAK3 6.38 0.40 7.7E-05 Asthma susceptibility 5

IRF1 7.15 0.33 9.5E-04 Gastric cancer, somatic

IRF1 7.15 0.33 9.5E-04 Myelodysplastic syndrome, preleukemic

IRF1 7.15 0.33 9.5E-04 Myelogenous leukemia, acute

Nature Genetics: doi:10.1038/ng.2951

Page 38: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

38

IRF1 7.15 0.33 9.5E-04 Nonsmall cell lung cancer, somatic

IRF5 5.05 0.40 1.5E-05 Inflammatory bowel disease 14

IRF5 5.05 0.40 1.5E-05 Systemic lupus erythematosus, susceptibility to, 10

ISCU 8.14 0.35 7.5E-04 Myopathy with lactic acidosis, hereditary

ITGA2B 4.80 0.57 1.7E-12 Glanzmann thrombasthenia

ITGA2B 4.80 0.57 1.7E-12 Thrombocytopenia, neonatal alloimmune, BAK antigen related

ITGA6 7.39 0.40 8.9E-05 Epidermolysis bullosa, junctional, with pyloric stenosis

ITGB2 9.40 0.33 3.3E-04 Leukocyte adhesion deficiency

ITGB3 5.51 0.64 9.2E-14 Glanzmann thrombasthenia

ITGB3 5.51 0.64 9.2E-14 Myocardial infarction, susceptibility to

ITGB3 5.51 0.64 9.2E-14 Purpura, posttransfusion

ITGB3 5.51 0.64 9.2E-14 Thrombocytopenia, neonatal alloimmune

ITM2B 10.49 0.39 1.0E-04 Dementia, familial British

ITM2B 10.49 0.39 1.0E-04 Dementia, familial Danish

ITPR1 7.38 0.49 2.3E-06 Spinocerebellar ataxia 15

JAM3 5.50 0.31 8.0E-04 Hemorrhagic destruction of the brain, subependymal calcification, and cataracts

KCNJ2 6.00 0.37 3.3E-04 Atrial fibrillation, familial, 9

KCNJ2 6.00 0.37 3.3E-04 Long QT syndrome-7

KCNJ2 6.00 0.37 3.3E-04 Short QT syndrome-3

KIAA1267 9.81 0.33 1.7E-04 Mental retardation, autosomal dominant 17

KLF1 6.55 0.32 7.1E-04 Anemia, dyserythropoietic congenital, type IV

KLF1 6.55 0.32 7.1E-04 Blood group--Lutheran inhibitor

KLF1 6.55 0.32 7.1E-04 Hereditary persistence of fetal hemoglobin

KLF11 5.64 0.35 3.8E-04 Maturity-onset diabetes of the young, type VII

KRT1 6.87 0.74 1.0E-23 Epidermolytic hyperkeratosis

KRT1 6.87 0.74 1.0E-23 Ichthyosis histrix, Curth-Macklin type

KRT1 6.87 0.74 1.0E-23 Ichthyosis, cyclic, with epidermolytic hyperkeratosis

KRT1 6.87 0.74 1.0E-23 Keratosis palmoplantaris striata III

KRT1 6.87 0.74 1.0E-23 Palmoplantar keratoderma, epidermolytic

Nature Genetics: doi:10.1038/ng.2951

Page 39: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

39

KRT1 6.87 0.74 1.0E-23 Palmoplantar keratoderma, nonepidermolytic

LDLR 5.68 0.35 5.6E-04 Hypercholesterolemia, familial

LDLR 5.68 0.35 5.6E-04 LDL cholesterol level QTL2

LEF1 8.18 0.42 6.3E-06 Sebaceous tumors, somatic

LGALS2 7.03 0.85 3.0E-30 Myocardial infarction, susceptibility to

LIPA 4.92 0.35 3.3E-04 Cholesteryl ester storage disease

LIPA 4.92 0.35 3.3E-04 Wolman disease

LIPN 4.26 0.33 4.7E-04 Ichthyosis, lamellar, 4

LMNA 6.86 0.37 1.3E-04 Cardiomyopathy, dilated, 1A

LMNA 6.86 0.37 1.3E-04 Charcot-Marie-Tooth disease, type 2B1

LMNA 6.86 0.37 1.3E-04 Emery-Dreifuss muscular dystrophy 2, AD

LMNA 6.86 0.37 1.3E-04 Emery-Dreifuss muscular dystrophy 3, AR

LMNA 6.86 0.37 1.3E-04 Heart-hand syndrome, Slovenian type

LMNA 6.86 0.37 1.3E-04 Hutchinson-Gilford progeria

LMNA 6.86 0.37 1.3E-04 Lipodystrophy, familial partial, 2

LMNA 6.86 0.37 1.3E-04 Malouf syndrome

LMNA 6.86 0.37 1.3E-04 Mandibuloacral dysplasia

LMNA 6.86 0.37 1.3E-04 Muscular dystrophy, congenital

LMNA 6.86 0.37 1.3E-04 Muscular dystrophy, limb-girdle, type 1B

LMNA 6.86 0.37 1.3E-04 Restrictive dermopathy, lethal

LPIN2 7.25 0.32 8.1E-04 Majeed syndrome

LPL 2.38 0.36 4.0E-04 Combined hyperlipidemia, familial

LPL 2.38 0.36 4.0E-04 High density lipoprotein cholesterol level QTL 11

LPL 2.38 0.36 4.0E-04 Lipoprotein lipase deficiency

LRRK2 7.40 0.50 2.4E-07 Parkinson disease 8

LST1 10.42 0.34 7.3E-04 Hyperbilirubinemia, Rotor type, digenic

LYZ 11.71 0.84 7.5E-24 Amyloidosis, renal

MAGT1 2.63 0.37 1.6E-04 Immunodeficiency, X-linked, with magnesium defect, Epstein-Barr virus infection and neoplasia

MAGT1 2.63 0.37 1.6E-04 Mental retardation, X-linked 95

Nature Genetics: doi:10.1038/ng.2951

Page 40: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

40

MAK 6.16 0.37 1.3E-04 REtinitis pigmentosa 62

MAL 6.44 0.42 5.7E-05 Megakaryoblastic leukemia, acute

MARCKS 10.13 0.49 1.9E-07 Macrocephaly, alopecia, cutis laxa, and scoliosis

MAX 10.40 0.52 1.6E-07 Pheochromocytoma, susceptibility to

MDM2 8.97 0.39 4.2E-05 Accelerated tumor formation, susceptibility to

MFN2 6.71 0.39 2.6E-05 Charcot-Marie-Tooth disease, type 2A2

MFN2 6.71 0.39 2.6E-05 Hereditary motor and sensory neuropathy VI

MMD2 2.76 0.39 1.9E-04 Miyoshi muscular dystrophy 2

MME 6.92 0.51 3.6E-07 Membranous glomerulonephritis, antenatal

MME 6.92 0.51 3.6E-07 Neutral endopeptidase deficiency

MMP9 7.04 0.35 2.2E-04 Metaphyseal anadysplasia 2

MS4A1 5.36 0.47 1.5E-07 Immunodeficiency, common variable, 5

MYH11 5.91 0.34 4.1E-04 Aortic aneurysm, familial thoracic 4

NAT8L 3.29 0.39 1.7E-04 N-acetylaspartate deficiency

NCOA4 8.35 0.47 3.0E-06 Thyroid carcinoma, papillary

NDUFS2 7.54 0.35 5.9E-04 Mitochondrial complex I deficiency

NFKBIA 8.27 0.64 5.3E-12 Ectodermal dysplasia, anhidrotic, with T-cell immunodeficiency

NHLRC1 2.47 0.41 1.1E-04 Epilepsy, progressive myoclonic 2B (Lafora)

NOD2 4.49 0.42 4.3E-05 Blau syndrome

NOD2 4.49 0.42 4.3E-05 Inflammatory bowel disease 1

NOD2 4.49 0.42 4.3E-05 Psoriatic arthritis, susceptibility to

NOD2 4.49 0.42 4.3E-05 Sarcoidosis, early-onset

OAS1 5.99 0.48 2.4E-06 Diabetes mellitus, type 1, susceptibility to

OAS1 5.99 0.48 2.4E-06 Viral infection, susceptibility to

OPTN 7.49 0.37 1.6E-04 Amyotrophic lateral sclerosis 12

OPTN 7.49 0.37 1.6E-04 Glaucoma 1, open angle, E

OPTN 7.49 0.37 1.6E-04 Glaucoma, normal tension, susceptibility to

PADI4 6.50 0.43 1.1E-05 Rheumatoid arthritis, susceptibility to

PAPPA 3.25 0.41 7.1E-05 Greig cephalopolysyndactyly syndrome

Nature Genetics: doi:10.1038/ng.2951

Page 41: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

41

PAPPA 3.25 0.41 7.1E-05 Hypothalamic hamartomas, somatic

PAPPA 3.25 0.41 7.1E-05 Pallister-Hall syndrome

PAPPA 3.25 0.41 7.1E-05 Polydactyly, postaxial, types A1 and B

PAPPA 3.25 0.41 7.1E-05 Polydactyly, preaxial, type IV

PINK1 9.82 0.42 7.5E-06 Parkinson disease 6, early onset

PLA2G7 4.64 0.40 4.9E-05 Asthma, susceptibility to

PLA2G7 4.64 0.40 4.9E-05 Atopy, susceptibility to

PLA2G7 4.64 0.40 4.9E-05 Platelet-activating factor acetylhydrolase deficiency

PLAGL1 5.34 0.35 4.4E-04 Diabetes mellitus, transient neonatal

PNPLA6 4.35 0.34 7.9E-04 Spastic paraplegia 39, autosomal recessive

PNPLA6 4.35 0.34 7.9E-04 Stuve-Wiedemann syndrome/Schwartz-Jampel type 2 syndrome

PPT1 7.66 0.54 1.7E-11 Ceroid lipofuscinosis, neuronal, 1

PRF1 9.33 0.45 1.2E-06 Hemophagocytic lymphohistiocytosis, familial, 2

PRF1 9.33 0.45 1.2E-06 Lymphoma, non-Hodgkin

PROK2 9.59 0.43 1.5E-06 Hypogonadism, hypogonadotropic

PROK2 9.59 0.43 1.5E-06 Kallmann syndrome 4

PRSS33 6.06 0.37 9.9E-05 Eosinophilia, familial

PTPRC 8.37 0.42 4.6E-05 Hepatitic C virus, susceptibility to

PTPRC 8.37 0.42 4.6E-05 Severe combined immunodeficiency, T cell-negative, B-cell/natural killer-cell positive

PTPRJ 7.61 0.37 3.3E-04 Colon cancer, somatic

RAB18 4.77 0.36 4.6E-04 Warburg micro syndrome 3

RGS2 9.85 0.43 2.0E-05 Rieger syndrome, type 2

RHCE 1.88 0.37 3.1E-04 Blood group, Rhesus

RHCE 1.88 0.37 3.1E-04 Rh-null disease, amorph type

RHD 4.15 0.58 2.8E-10 Rh-negative blood type

RNASET2 8.38 0.65 5.2E-16 Leukoencephalopathy, cystic, without megalencephaly

RPL11 10.04 0.34 6.2E-04 Diamond-Blackfan anemia 7

SAMHD1 9.38 0.38 2.3E-04 Aicardi-Goutieres syndrome 5

SAMHD1 9.38 0.38 2.3E-04 Chilblain lupus 2

Nature Genetics: doi:10.1038/ng.2951

Page 42: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

42

SELP 5.45 0.44 3.0E-06 Atopy, susceptibility to

SELP 5.45 0.44 3.0E-06 Platelet alpha/delta storage pool deficiency

SERPINA1 9.80 0.43 4.1E-06 Emphysema due to AAT deficiency

SERPINA1 9.80 0.43 4.1E-06 Emphysema-cirrhosis, due to AAT deficiency

SERPINA1 9.80 0.43 4.1E-06 Hemorrhagic diathesis due to \'antithrombin\' Pittsburgh

SERPINA1 9.80 0.43 4.1E-06 Pulmonary disease, chronic obstructive, susceptibility to

SERPINB6 6.52 0.48 2.1E-06 Deafness, autosomal recessive 91

SERPING1 4.17 0.35 6.0E-04 Angioedema, hereditary, types I and II

SERPING1 4.17 0.35 6.0E-04 Complement component 4, partial deficiency of

SLC11A1 4.86 0.40 3.0E-05 Buruli ulcer, susceptibility to

SLC11A1 4.86 0.40 3.0E-05 Mycobacterium tuberculosis, susceptibility to infection by

SLC12A1 4.37 0.51 1.9E-10 Bartter syndrome, type 1

SLC14A1 6.15 0.37 5.7E-05 Blood group, Kidd

SLC4A1 7.59 0.57 3.9E-10 Blood group, Diego

SLC4A1 7.59 0.57 3.9E-10 Blood group, Froese

SLC4A1 7.59 0.57 3.9E-10 Blood group, Swann

SLC4A1 7.59 0.57 3.9E-10 Blood group, Waldner

SLC4A1 7.59 0.57 3.9E-10 Blood group, Wright

SLC4A1 7.59 0.57 3.9E-10 Malaria, resistance to

SLC4A1 7.59 0.57 3.9E-10 Ovalocytosis

SLC4A1 7.59 0.57 3.9E-10 Renal tubular acidosis, distal, AD

SLC4A1 7.59 0.57 3.9E-10 Renal tubular acidosis, distal, AR

SLC4A1 7.59 0.57 3.9E-10 Spherocytosis, type 4

SLC6A8 6.71 0.35 2.0E-04 Creatine deficiency syndrome, X-linked

SNCA 9.61 0.32 6.3E-04 Dementia, Lewy body

SNCA 9.61 0.32 6.3E-04 Parkinson disease 1

SNCA 9.61 0.32 6.3E-04 Parkinson disease 4

SNRPN 7.48 0.46 4.4E-07 Prader-Willi syndrome

SORT1 5.08 0.42 4.5E-06 Low density lipoprotein cholesterol level QTL6

Nature Genetics: doi:10.1038/ng.2951

Page 43: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

43

SPG20 2.46 0.36 4.7E-04 Troyer syndrome

SPINT2 7.51 0.47 9.0E-06 Diarrhea 3, secretory sodium, congenital, syndromic

STX11 8.52 0.34 9.0E-04 Hemophagocytic lymphohistiocytosis, familial, 4

SYNE2 6.78 0.46 1.7E-06 Emery-Dreifuss muscular dystrophy 5, autosomal dominant

TACSTD2 2.85 0.38 7.5E-05 Corneal dystrophy, gelatinous drop-like

TAP2 9.03 0.73 7.4E-26 Bare lymphocyte syndrome, type I, due to TAP2 deficiency

TAP2 9.03 0.73 7.4E-26 Wegener-like granulomatosis

TCL1A 5.91 0.58 3.6E-11 Leukemia/lymphoma, T-cell

THBD 7.86 0.35 1.8E-04 Hemolytic uremic syndrome, atypical, susceptibility to, 6

THBD 7.86 0.35 1.8E-04 Thrombophilia due to thrombomodulin defect

TKT 8.36 0.42 2.2E-05 Spondylometaepiphyseal dysplasia, short limb-hand type

TLR2 9.55 0.38 1.3E-04 Colorectal cancer, susceptibility to

TLR2 9.55 0.38 1.3E-04 Leprosy, susceptibility to

TLR4 8.12 0.44 1.8E-05 Colorectal cancer, susceptibility to

TLR4 8.12 0.44 1.8E-05 Endotoxin hyporesponsiveness

TLR4 8.12 0.44 1.8E-05 Macular degeneration, age-related, 10

TNFRSF10B 7.35 0.42 3.3E-05 Squamous cell carcinoma, head and neck

TNFRSF13B 6.56 0.37 1.5E-04 Immunodeficiency, common variable, 2

TNFRSF13B 6.56 0.37 1.5E-04 Immunoglobulin A deficiency 2

TNNT1 4.70 0.60 2.2E-11 Nemaline myopathy, Amish type

TPM1 6.92 0.38 8.3E-05 Cardiomyopathy, dilated, 1Y

TPM1 6.92 0.38 8.3E-05 Cardiomyopathy, familial hypertrophic, 3

TPM2 5.48 0.43 2.1E-05 Arthrogryposis multiplex congenita, distal, type 1

TPM2 5.48 0.43 2.1E-05 Arthrogryposis, distal, type 2B

TPM2 5.48 0.43 2.1E-05 Nemaline myopathy

TUBB1 7.32 0.60 5.6E-12 Macrothrombocytopenia, autosomal dominant, TUBB1-related

UBB 11.55 0.55 2.1E-08 Cleft palate, isolated

UCP2 8.14 0.59 9.8E-10 Obesity, susceptibility to, BMIQ4

VAPB 6.70 0.41 7.6E-05 Amyotrophic lateral sclerosis 8

Nature Genetics: doi:10.1038/ng.2951

Page 44: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

44

VAPB 6.70 0.41 7.6E-05 Spinal muscular atrophy, late-onset, Finkel type

VCAN 8.25 0.43 2.5E-06 Wagner syndrome 1

VCL 7.10 0.49 1.5E-07 Cardiomyopathy, dilated, 1W

VCL 7.10 0.49 1.5E-07 Cardiomyopathy, familial hypertrophic, 15

VNN1 4.94 0.56 3.5E-09 High density lipoprotein cholesterol level QTL 8

WNK1 8.37 0.39 1.6E-04 Neuropathy, hereditary sensory and autonomic, type II

WNK1 8.37 0.39 1.6E-04 Pseudohypoaldosteronism, type IIC

XIST 7.11 0.38 1.9E-04 X-inactivation, familial skewed

XYLT1 3.75 0.43 4.9E-05 Pseudoxanthoma elasticum, modifier of severity of

The first four columns indicate genes with significant heritability (standard gene name, mean expression level, a2

estimate of heritability, and the heritability P value). The last column shows the corresponding Online Mendelian Inheritance in Man (OMIM) entry.

4 Some genes have several entries in OMIM.

Supplementary Table 6. Properties of SNPs in distant eQTLs.

Upstream

or 5’ UTR

Downstream

or 3’ UTR

Intergenic

regions

Intronic

regions

Exonic

regions Overall

All 20 31 53 194 6 304

Regulatory

features 7 [0.05] 10 [0.04] 6 [0.1] 32 [0.16] 1 [0.69] 56

Replicated in

NESDA 12 [0.17] 12 [0.22] 14 [7e-4] 100 [0.02] 5[0.08] 143

The number of eSNPs (SNPs among significant distant eQTLs, q<0.001) assigned to different genomic features by the Variant Effect Predictor, as well as the number of eSNPs that overlap with regulatory features or replicated in NESDA with q-value < 0.1. The regulatory features are open chromatin regions or transcription factor binding sites. A cell in the table is highlighted to red/blue if the observed proportion is higher/lower than the overall proportion. The numbers in the brackets are one-sided hyper-geometric test P-values, not corrected for multiple comparisons.

Nature Genetics: doi:10.1038/ng.2951

Page 45: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

45

Supplementary Table 7. Multiple regression analyses for heritability and disease.

Response variable

Predictor Twin-based h

2 OMIM (2831) NHGRI (1626)

NHGRI, immune (590)

NHGRI, non-immune

(1036) OMIM+NHGRI (4024)

t P z P z P z P z P z P

Intercept -3.07 2.18x10-3

-20.22 6.32x10-91

-16.03 8.37x10-58

-12.50 7.02x10-36

-14.23 6.20x10-46

-19.72 1.52x10-86

OMIM 4.96 7.08x10-7 - - - - - - - - - -

NHGRI 3.46 5.44x10-4 - - - - - - - - - -

Twin-based h2 - - 5.38 7.25x10

-8 4.11 3.90x10

-5 2.67 7.67x10

-3 3.01 2.64x10

-3 5.84 5.16x10

-9

Best local SNP r2 8.67 4.62x10

-18 0.45 6.52x10

-1 -1.01 3.11x10

-1 -0.29 7.72x10

-1 -1.06 2.91x10

-1 -0.38 7.00x10

-1

Best distant interchrom. SNP r2 16.05 1.49x10

-57 -0.90 3.66x10

-1 -1.73 8.36x10

-2 -1.14 2.53x10

-1 -1.26 2.08x10

-1 -1.58 1.14x10

-1

IBD-based local h2 2.80 5.06x10

-3 -0.74 4.58x10

-1 0.26 7.96x10

-1 -0.27 7.87x10

-1 0.49 6.21x10

-1 -0.41 6.85x10

-1

GCTA local 13.38 1.16x10-40

-0.16 8.72x10-1

0.97 3.33x10-1

0.71 4.77x10-1

0.56 5.78x10-1

0.63 5.30x10-1

GCTA genomewide 2.36 1.82x10-2

-0.65 5.15x10-1

0.77 4.41x10-1

2.02 4.36x10-2

-0.47 6.38x10-1

0.06 9.51x10-1

Expression mean 31.91 1.45x10-217

-2.44 1.47x10-2

0.29 7.73x10-1

2.49 1.28x10-2

-1.15 2.51x10-1

-1.22 2.24x10-1

Expression variance 15.22 5.71x10-52

-0.19 8.47x10-1

0.49 6.22x10-1

1.81 7.02x10-2

-0.68 4.95x10-1

0.48 6.32x10-1

GC content, +5kb of TSS -1.27 2.06x10-1

4.98 6.49x10-7 3.22 1.29x10

-3 2.51 1.19x10

-2 2.04 4.18x10

-2 5.47 4.40x10

-8

GC content, -5kb of TSS -0.80 4.23x10-1

2.19 2.87x10-2

2.12 3.40x10-2

0.27 7.85x10-1

2.28 2.26x10-2

2.62 8.83x10-3

DHS near TSS, blood 4.49 7.17x10-6 1.29 1.98x10

-1 0.10 9.20x10

-1 0.33 7.43x10

-1 -0.01 9.92x10

-1 0.43 6.65x10

-1

DHS near TSS 2.66 7.79x10-3

1.98 4.81x10-2

1.54 1.24x10-1

1.16 2.48x10-1

1.01 3.13x10-1

2.28 2.29x10-2

Gene density -4.86 1.18x10-6 -5.25 1.51x10

-7 -17.47 2.55x10

-68 -8.77 1.71x10

-18 -14.62 2.21x10

-48 -13.44 3.62x10

-41

Gene size 7.46 9.01x10-14

3.71 2.07x10-4 2.98 2.88x10

-3 0.67 5.02x10

-1 2.88 4.00x10

-3 4.75 2.05x10

-6

Local recombination rate 2.22 2.62x10-2

1.99 4.65x10-2

1.32 1.87x10-1

1.00 3.18x10-1

0.85 3.94x10-1

2.79 5.25x10-3

Gene conservation score 5.04 4.64x10-7 11.50 1.35x10

-30 0.75 4.51x10

-1 0.81 4.16x10

-1 0.39 6.97x10

-1 8.60 7.77x10

-18

Genes under selection -0.33 7.45x10-1

1.82 6.91x10-2

1.28 1.99x10-1

-1.44 1.51x10-1

2.46 1.41x10-2

1.43 1.53x10-1

Positive selection 0.14 8.90x10-1

0.23 8.15x10-1

-0.30 7.65x10-1

0.47 6.39x10-1

-0.65 5.15x10-1

0.26 7.93x10-1

Negative selection 2.07 3.84x10-2

1.52 1.28x10-1

2.31 2.10x10-2

-0.10 9.24x10-1

2.73 6.40x10-3

2.94 3.23x10-3

Human accel. genes -0.13 8.95x10-1

0.87 3.85x10-1

1.23 2.19x10-1

0.38 7.04x10-1

1.14 2.53x10-1

2.13 3.29x10-2

Primate accel. genes 1.13 2.60x10-1

-0.52 6.05x10-1

0.61 5.41x10-1

0.03 9.76x10-1

0.65 5.18x10-1

0.22 8.27x10-1

Adaptive 0.54 5.88x10-1

1.46 1.46x10-1

-0.98 3.29x10-1

0.00 9.99x10-1

-1.09 2.75x10-1

0.29 7.71x10-1

Gene on chr 6 1.03 3.04x10-1

-0.15 8.82x10-1

6.39 1.70x10-10

4.43 9.30x10-6 4.36 1.31x10

-5 4.14 3.47x10

-5

Gene on chr 19 2.25 2.43x10-2

-2.50 1.25x10-2

-2.21 2.73x10-2

-0.41 6.80x10-1

-2.35 1.87x10-2

-3.29 1.02x10-3

Gene on chr X -2.20 2.76x10-2

7.14 9.55x10-13

-7.36 1.87x10-13

-4.08 4.59x10-5 -5.61 2.08x10

-8 0.90 3.66x10

-1

(Blood DHS near TSS) x (conservation score)

-6.40 1.59x10-10

-4.03 5.68x10-5 -2.19 2.88x10

-2 -1.44 1.49x10

-1 -1.62 1.05x10

-1 -3.41 6.54x10

-4

Results based on linear regression analyses for h2 as a response, and logistic regression for the remaining (binary) responses, where disease status is binary (e.g. OMIM coded

as 1=yes, 0=no), and number of genes with the disease designation shown in parentheses. All 18,392 genes were used as observations. The predictors are as defined in Table 2,

with the addition of best local and distant SNP r2 values and chromosome. Results in bold have P<0.0017, meeting the Bonferroni corrected standard of <0.05 for 29 tests.

Results in blue are those with a significantly negative relationship between predictor and response. DHS=DNase 1 hypersensitivity sites.

Nature Genetics: doi:10.1038/ng.2951

Page 46: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 46 of 75

References

1. Rossin, E.J. et al. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS genetics 7, e1001273 (2011).

2. Huang, D.W. et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res 35, W169-75 (2007).

3. Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106, 9362-7 (2009).

4. McKusick, V.A. Online Mendelian Inheritance in Man. (URL: http://www3.ncbi.nlm.nih.gov/Omim/searchomim.html) edn (NCBI/NIH, 2004).

Nature Genetics: doi:10.1038/ng.2951

Page 47: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 47 of 75

Supplementary Note (SN) to “Heritability and Genomics of Gene Expression”

Wright, Sullivan et al.

Nature Genetics: doi:10.1038/ng.2951

Page 48: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 48 of 75

The Affymetrix U219 chip

The Affymetrix Human Genome U219 Array (“U219”, Affymetrix, Santa Clara, CA) contains 530,467 probes for 49,293 transcripts (excluding “AFFX” control probes. All probes are 25 bases in length and designed to be “perfect match” complements to a designated transcript. Most transcripts have 11 probes (0.881) or 9 probes (0.115) with the remainder having 8 or 10 probes (0.004 in total). Sequences used to design the U219 array were selected from UniGene 219 (30 March 2009), RefSeq 36 (13 July 2009), and GenBank (12 May 2009).

The spatial design of the U219 chip is depicted in the Figure below (upper left), and the number of transcripts per Mb in the upper right panel (light grey reference circles represent 25 counts).

Spatial structure of the U219 chip Transcripts assessed by U219 per mb

Comparing Affymetrix & RNAseq (n=83) RNAseq-Affymetrix correlations in gene expression (n=83)

Nature Genetics: doi:10.1038/ng.2951

Page 49: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 49 of 75

Gene expression assays

In a separate study, we compared assessment of gene expression by Affymetrix 1.1ST arrays with RNAseq using Illumina HiSeq2000. There were 83 samples of RNA extracted from whole brain using a 3x3 diallel cross of CAST, PWK, and WSB wild-derived inbred mice. The 1.1ST chip uses the newer GeneTitan 96 well “peg” format.

In the previous figure, the lower left panel shows a scatterplot of gene expression measures for the same transcripts. As anticipated, RNAseq has a greater dynamic range, but the two methods have a nearly linear relationship for RNAseq values between 5 and 15. In the same figure, the lower right panel shows a histogram of 83 correlations between gene expression measures between Affymetrix microarray and RNAseq (median 0.85).

Thus, although RNAseq has advantages in its greater dynamic range and capacity to identify novel transcripts, splice variants, and allele-specific expression, Affymetrix GeneTitan arrays remain a useful alternative method to evaluate gene expression.

Gene expression methods and quality control (QC)

Randomization

Before beginning hybridization of gene expression assays, NTR samples were randomly assigned to plates (96 well format, described below). NESDA samples were independently randomized as well. The randomization scheme was simple random sampling, with a posthoc check to ensure that no significant deviations occurred per plate for sex or zygosity status. Seven plates contained samples from both studies to better inform array QC and study comparability.

Initial processing & normalization

Gene expression assays were conducted at the Rutgers University Cell and DNA Repository (RUCDR, http://www.rucdr.org). Total RNA was extracted from PAXgene tubes using the PAXgene Blood RNA MDx Kit protocol. RNA extraction was done in 96 well format using the BioRobot Universal System (Qiagen). RNA samples were then eluted into 2-D barcode tubes for sample storage and subsequent target preparation. The quality and quantity of RNA extracted from each sample was assessed by Caliper AMS90 with a HT DNA5K/RNA LabChip and spectrophotometric analysis. For cDNA synthesis, 50ng of RNA was reverse-transcribed and amplified in a plate format on a Biomek FX liquid handling robot (Beckman Coulter) using Ovation Pico WTA reagents per the manufacturer’s protocol (NuGEN). Products purified from single primer isothermal amplification (SPIA) were then fragmented and labeled with biotin using Encore Biotin Module (NuGEN). Prior to hybridization, the labeled cDNA was analyzed using electrophoresis to verify the appropriate size distribution (Caliper AMS90 with a HT DNA 5K/RNA LabChip).

Arrays from the Affymetrix U219 expression platform were hybridized using the GeneTitan system in 96-array plates. Array hybridization, washing, staining, and scanning were carried out in an Affymetrix GeneTitan System per the manufacturer’s protocol. Gene expression data were required to pass standard Affymetrix QC metrics (Affymetrix expression console) before further analysis.

Estimation of expression values was performed simultaneously on a superset of samples, including twins from NTR that are the primary focus of the current study, a number of NTR non-twin family members, as well as samples from the Netherlands Study of Depression and

Nature Genetics: doi:10.1038/ng.2951

Page 50: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 50 of 75

Anxiety (NESDA), with the intent of using a subset of the NESDA samples as a replication set for NTR eQTL findings.

The array superset consisted of 6526 U219 arrays (3516 NTR, 2783 NESDA, 227 controls), hybridized on 69 plates. Expression values were obtained using RMA normalization implemented in Affymetrix Power Tools (APT, v 1.12.0). Among the arrays on the first 64 plates, 417 were identified as having reduced quality (using the D statistic described below), and were re-hybridized on the remaining plates.

Two expression files were created, using (i) all probes, and (ii) the remaining probes after using the APT --kill-list command to mask out probes that contained SNPs. Probes were removed when their location was uncertain or if their location intersected a polymorphic SNP (dropped if the probe oligonucleotide sequence did not map uniquely to hg19 or if the probe contained a polymorphic SNP based on HapMap3 1 and 1000 Genomes project data 2). Array inclusion criteria were based on the masked dataset, although broad conclusions and heritability estimates were highly similar for the unmasked set.

QC

The vast majority of arrays were judged to be acceptable based on standard APT expression QC metrics, but the large sample size enabled additional quality control metrics involving inter-sample comparisons. A multi-step iterative QC process was used to identify samples with lowered quality.

Initially, a small number (12) of NTR twin samples showed inconsistent sex between the clinical database and the expression data, as judged by 10 probe sets on chromosome X and Y most able to discriminate sex, and were dropped. After this initial QC and confining our focus to NTR twins (whether or not both co-twins remained in the sample set), 2819 NTR arrays remained. At this stage, PLINK analysis of affy6 arrays was used to confirm zygosity (see below), and pairs with irreconcilable differences were removed.

We used pairwise correlation matrix of expression profiles across all arrays for additional QC. We use rij to denote the correlation between arrays i and j. For a single array i and number of

arrays n, we use /i ij

j

r r n to denote the average correlation of array i with all other

arrays. This value tends to be lower for poorly performing arrays. These quantities were expressed in terms of median absolute deviations | | to provide a

sense of scale, with the grand mean of all correlations. Large negative D values corresponded to poor quality. Values of D < -5.0 were considered to be outliers of questionable quality when deciding samples that should be re-hybridized. For samples hybridized twice, the array with the more favorable D was retained. Finally, an initial comparison of expression profiles to Affymetrix 6.0 genotypes was performed, with the goal of further quality-ranking of samples on a criterion aligned with the expression-QTL mapping activity. Briefly, a set of the 500 most significant locally-acting SNP-transcript pairs was used to estimate a posterior probability for the expression array to be a “good match” to the genotype profile, vs. the probability that the arrays acted independently of each other, with evidence accruing across multiple SNP-transcript pairs. This “sample-match” approach was highly effective at confirming the few known sex-mismatched samples, and used to identify additional samples of poor quality. Moreover, D and the posterior probability of “mismatch” were highly correlated.

Nature Genetics: doi:10.1038/ng.2951

Page 51: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 51 of 75

Significant probesets after dropping samples per ICC heritability estimates

. D vs. log10(posterior ratio) and the corresponding “best” thresholds

Dropping 19 samples (D<-9.5) maximized the number of significant probe sets. A similar investigation using

log-posterior RNA-DNA match proportion maximized the number of significant genes after dropping 26 samples from n=2616 matched co-twins.

We reasoned that D and the sample-match posterior probability might prove effective at further quality control, but the “best” threshold for dropping samples was unknown. For this purpose, we successively dropped individual samples according to D (dropping the worst samples first) and recomputed the simple intraclass correlation coefficient (ICC)-based

estimate of heritability as described in Heritability Methods. We handled covariates by first residualizing the expression data using covariates plate, age, sex, well position, and white blood cell count. For the first 100 dropped samples, the ICC estimates were recomputed after dropping each individual sample, and in sets of 10 dropped samples thereafter. For each set of dropped samples, the simple ICC h2 was divided by its standard error, and a one-sided p-value computed for high h2. A Benjamini-Hochberg false discovery-rate q-value was computed using p.adjust in R v.2.14 for the ~47,000 probe sets. Using the D criterion, dropping 19 samples resulted in the largest number of significant probesets with q < 0.10 (see the Figure above). This choice was robust to the specific q threshold in the range q=0.05 to q=0.20. Further quantile normalization of the residualized expression data resulted in essentially the same conclusion and the same samples identified as of lowered quality.

Selection of covariates for NTR gene expression analyses

Our investigation of expression covariates was intended to identify a minimal set of covariates to increase power for expression heritability calculation and improve the eQTL mapping. A more complete consideration of these and other potential covariates, as well as their biological underpinnings, will be undertaken elsewhere. The covariates can be roughly divided into (i) covariates related to technical variation, (ii) clinical covariates that are subject-specific, and (iii) covariates related to blood counts, which if not properly accounted for might produce spurious “eQTL” relationships.

Technical covariates included sample processing dates, hybridization plate (which is nested within, and therefore fully accounts for, hybridization batch runs), and well position within the

0 200 400 600 800 1000

05

00

10

00

15

00

20

00

25

00

30

00

num.drop

nu

m.s

ign

ifN

umbe

r of

gen

es w

ith

ICC

-bas

ed q

<0.1

0

Number of samples dropped based on D value

log10(posterior ratio)

D

Nature Genetics: doi:10.1038/ng.2951

Page 52: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 52 of 75

96-well plate. Examination of well effects indicated that the 96 wells could be grouped into a simple grid of 24 sets of four wells each, with almost the same adjusted R2 in the testing described below.

Subject-specific covariates included sex, age in years at blood draw, body mass index, blood count measurements, and date and time of day for blood sampling. Additional quality indices (the D statistic and posterior probability of RNA-DNA mismatch, see expression QC section above) were also investigated for correlation with expression. These measures would ordinarily be expected to mainly affect error variation, but after scaling/normalization, can also be correlated with average expression levels.

A small proportion of clinical covariates (2.1% of entire clinical covariate set) were missing, and were imputed. The covariates with missing data were body mass index (BMI, 11 missing values), total white blood counts (47 missing), red blood count (40 missing). BMI was imputed by regressing BMI versus age for those with complete data, performed separately for each of the twin zygosity status groups (average multiple R2 ~ 0.10). White and red blood count values were imputed by regressing these values on BMI and BMI2, with multiple R2 ~ 0.03 for each. Although the imputation prediction accuracy was not high, the imputed proportion was very small, with a minimal impact on downstream analysis.

Covariates were selected by type III sum of squares testing in SAS (v9.2), in which the simultaneous effects of the covariates were determined for each probe set (“transcript” in the terminology of the main manuscript). In a type III test, the partial model sum of squares for each covariate is measured by comparing a full model containing the entire set of covariates to a reduced model that excludes the covariate under consideration. The influence of each covariate was quantified by recording the number of significant probe sets (5% Benjamini-Hochberg FDR threshold) among the ~47,000 probe sets.

Complete white blood counts, consisting of lymphocytes, neutrophils, basophils, monocytes, and eosinophils were available for ~93% of the samples (and the total white blood count available for all but 47 of the twin samples, as described above).

In addition, 122 potentially confounding SNPs were identified by performing GWAS analysis among NTR individuals (ignoring twin status) of all of the complete red and white blood count phenotypes; (ii) appearing in the NHGRI GWAS catalog as associated (P<5X10-8) with any red/white/platelet counts. A small number of SNPs (fewer than 10) were not among the imputed set of available SNPs, and the SNAP tool (http://www.broadinstitute.org/mpg/snap/) was used to identify imputed proxies with r2 exceeding 0.8. Some of the pairs of SNPs exhibited pairwise |r| with each other of greater than 0.9, and were thinned until a final set of 109 contained no such instances, which could otherwise produce problematic collinearity.

Finally, to address the possibility of additional latent technical or subject variation, principal components analysis was performed on the residualized expression values, after correcting for the above-mentioned covariates. This approach is similar to surrogate variable analysis, 3 but more appropriate for the eQTL setting, where no single “phenotype” is of interest. The first 7 principal components on these residuals were chosen based on eigenvalue examination, each individually accounting for no more than a proportion 0.0005 of the otherwise unmodeled variance. Comparison of heritability estimates with and without the PC covariates indicated that the remaining PC correction had a small effect on overall significance, and perhaps improved it.

The final set of empirically-chosen covariates, accounting for ~200 degrees of freedom, are shown in the table below in decreasing order of importance. Other covariates that were

Nature Genetics: doi:10.1038/ng.2951

Page 53: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 53 of 75

evaluated but not included because of insufficient effects were alcohol use, education, and fasting status.

Empirically-selected gene expression covariates and SNP covariates

Covariate Type d.f. Probe sets significant at B-H q < 0.05

Affymetrix GeneTitan plate Factor 41 47,175

D statistic (defined above) Numeric 1 37,878

Well location on plate (as 24 sets of well quadruples) Factor 23 14,577

Days from blood draw to RNA hybridization Numeric 1 14,319

Days from RNA amplification to fragmentation Numeric 1 7,739

Calendar month of blood draw Factor 11 4,024

Monocyte cell count Numeric 1 2,173

Interaction of sex & monocytes count Numeric 1 2,159

Interaction of sex & lymphocytes cell count Numeric 1 2,145

Interaction of sex & eosinophils count Numeric 1 2,130

Interaction of sex & neutrophils count Numeric 1 2,126

Interaction of sex & basophils count Numeric 1 2,103

Lymphocyte cell count Numeric 1 2,102

Eosinophil count Numeric 1 2,092

Basophil count Numeric 1 2,084

Neutrophil count Numeric 1 2,069

Body mass index Numeric 1 1,901

Days from RNA extraction to amplification Numeric 1 1,827

Hour of day of blood draw Numeric 1 1,469

Age at blood draw Numeric 1 1,278

Smoking status Factor 1 684

Sex Factor 1 619

Interaction of sex & red blood cell count Numeric 1 482

Red blood cell count Numeric 1 476

Principal components 1-3 (estimated genotype data) Numeric 3x1 NA

109 SNPs previously associated with blood counts or with P<10

-8 in NTR (treating 2464 individuals as if unrelated,

ignoring twin status). SNPs with r2>0.9 with previously

included SNPs were dropped.

Numeric 109x1 NA

Principal components 1-7 (estimated on expression residuals)

Numeric 7x1 > 0.5% of residual expression variation

explained

Nature Genetics: doi:10.1038/ng.2951

Page 54: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 54 of 75

The 109 SNPs associated with blood counts are listed below:

SNP Chr Position A1 A2

rs505404 chr1 243267 T G

rs7189020 chr1 304802 A T

rs1122794 chr1 309154 C A

rs6065 chr1 4836380 C T

rs7342306 chr1 6291092 G A

rs150785571 chr1 7336645 C A

rs2336384 chr1 12046062 G T

rs7255045 chr1 12932268 G A

rs11085824 chr1 13001546 A G

rs2108978 chr1 19861457 C T

rs11082304 chr1 20720972 G T

rs10147992 chr1 25503798 A G

rs2138852 chr1 27703348 C T

rs10512472 chr1 33884803 T C

rs17609240 chr1 38110688 T G

rs3859192 chr1 38128647 C T

rs4794822 chr1 38156711 C T

rs708382 chr1 42442343 T C

rs11181569 chr1 42980488 T C

rs17356664 chr1 45740770 C T

rs11239550 chr1 46024728 A G

rs941207 chr1 57023283 C G

rs11071720 chr1 63341995 T C

rs10761731 chr1 65027609 A T

rs2393967 chr1 65133155 A C

rs1719271 chr1 65183800 A G

rs8022206 chr1 68520905 G A

rs1417436 chr1 70160177 T C

rs16926246 chr1 71093391 C T

rs76650398 chr1 78947595 A G

rs11018874 chr1 89875436 G A

rs8006385 chr1 93501025 A G

rs4148441 chr1 95898206 A G

rs7149242 chr1 101159415 T G

rs11628318 chr1 103040086 T A

rs2297067 chr1 103566784 C T

rs653178 chr1 112007755 C T

rs17696736 chr1 112486817 A G

rs17824620 chr1 113100993 C A

rs4148170 chr1 119027544 G A

Nature Genetics: doi:10.1038/ng.2951

Page 55: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 55 of 75

rs7961894 chr1 122365582 C T

rs12075 chr1 159175353 G A

rs10914144 chr1 171949749 T C

rs1668873 chr1 205235989 G A

rs12748961 chr1 205676262 T C

rs3811444 chr1 248039450 C T

rs6136489 chr2 1923733 T G

rs1034566 chr2 19984276 C T

rs1260326 chr2 27730939 T C

rs647316 chr2 31464828 A G

rs625132 chr2 31482299 A G

rs9609565 chr2 32867527 G A

rs5756506 chr2 37467391 G C

rs2413450 chr2 37470223 T C

rs6072085 chr2 39271480 C G

rs7275212 chr2 39852550 A T

rs17030845 chr2 43687878 C T

rs10495928 chr2 46353165 A G

rs131794 chr2 50971751 A C

rs6013509 chr2 51318350 G A

rs2540917 chr2 60608758 T C

rs12988934 chr2 182323664 C T

2:216058171:C:CA chr2 216058171 C A

rs7641175 chr3 18311411 G A

rs1354034 chr3 56849748 T C

rs12485738 chr3 56865775 A G

rs3792366 chr3 122839875 G A

rs4328821 chr3 128316434 A G

rs9859260 chr3 195800546 C T

rs11915082 chr3 195809138 G A

rs170117 chr4 55390379 C T

rs1371799 chr4 74977836 T C

rs7694379 chr4 88186508 G A

rs4643969 chr5 45119123 C T

rs17568628 chr5 76046938 T C

rs700585 chr5 88152116 C T

rs2070729 chr5 131819920 C A

rs115908071 chr6 19455146 A G

rs441460 chr6 25548287 G A

rs1800562 chr6 26093140 G A

rs198846 chr6 26107462 A G

rs3819299 chr6 31322366 T G

rs2071591 chr6 31515798 G A

Nature Genetics: doi:10.1038/ng.2951

Page 56: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 56 of 75

rs399604 chr6 32975013 T C

rs210134 chr6 33540208 A G

rs9349205 chr6 41925158 G A

rs11970772 chr6 41925289 T A

rs9374080 chr6 109616419 T C

rs35899685 chr6 123694362 G C

rs7776054 chr6 135418915 A G

rs9483788 chr6 135435500 T C

rs628751 chr6 139838418 C A

rs4719997 chr7 30244912 T C

rs12718730 chr7 50436827 T A

rs445 chr7 92408369 C T

rs7786877 chr7 100214014 A G

rs2075672 chr7 100240295 A G

rs342275 chr7 106359215 C T

rs4731120 chr7 123411222 A C

rs10224002 chr7 151415040 A G

rs6993770 chr8 106581527 A T

rs10956483 chr8 130572109 G C

rs6995402 chr8 145005560 T C

rs409801 chr9 4744742 T C

rs385893 chr9 4763175 T C

rs13300663 chr9 4814947 G C

rs10758658 chr9 4856876 G A

rs3731211 chr9 21986846 T A

rs11789898 chr9 136925662 G T

Additional normalization

As a final step, expression values were further corrected per probe set using an inverse quantile normal transformation of the ranked values, creating a symmetric and nearly exactly normal distribution for each probe set. The robust approach greatly reduces the effect of outliers in eQTL analysis, while retaining more power than rank-based procedures. 4

The Affymetrix 6.0 chip

The Affymetrix Genome-Wide Human SNP Array 6.0 (“affy6”, Affymetrix, Santa Clara, CA) contains 931,946 SNPs and 945,826 CNV probes. The spatial design of the affy6 chip is shown in the the figure below (left panel). This false-color image shows the four quadrants containing SNPs and the darker cross-shaped region containing invariant CNV probes. The numbers of SNPs and CNV probes per mb on the affy6 chip is shown in the figure below (right panel; the light grey reference circles represent 250 counts.).

Each affy6 SNP is interrogated by two oligonucleotide probes, each a “perfect match” complement to the genomic sequence for one of the two alleles for each SNP. About 88% of

Nature Genetics: doi:10.1038/ng.2951

Page 57: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 57 of 75

SNPs were assessed in triplicate (randomly allocated to three quadrants), and the rest were assessed in quadruplicate (one probe set in each quadrant). Each probe is 25 bases with the SNP base located between positions 9-17. CNV probes contained no know variant and were selected to cover CNVs in the Database of Genomic Variants plus tiling across the genome.

Spatial structure of the affy6 chip Affy6 SNP (green) and CNV (blue) probe density per mb.

Genotype quality control (QC)

General methods

Software used included PLINK,5 the EigenSoft “smartpca” module,6 SAS (for database management),7 R,8 CIRCOS,9 and custom scripts. QC conformed to typical standards used in the field,10-12 and was conducted in an iterative fashion.

Although affy6 genotyping is conducted in individual cartridges, samples were processed in batches of size 96. Subjects were randomized to genotyping plates where multiple pre-hybridization steps were conducted in tandem. Each plate contained a common CEU sample. An initial evaluation of genotype performance flagged 94 samples that were then genotyped a second time.

Randomization

Simple randomization was used to randomize samples to batches and wells, with a posthoc check that no significant deviations occurred in sex or zygosity proportions per batch.

Genotype calling

We conducted preliminary QC on the Affymetrix intensity files (.CEL). We computed the mean and standard deviation across all SNP probes in each .CEL file in order to identify sample and plate outliers. All .CEL files were called in one batch using Affymetrix Power Tools (apt-1.14.2). The Affymetrix annotation file (GenomeWideSNP_6.na31.annot.csv) was used for the initial SNP annotation and the SNPs without location information were

Nature Genetics: doi:10.1038/ng.2951

Page 58: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 58 of 75

discarded. Genotypes were converted into transposed format files using a custom Python script, and we then used PLINK to create the typical .bed, .bim, and .fam PLINK file sets.

Genotype QC

Quality control for the affy6 chip – in terms of the filters used for SNPs and subjects – is summarized in the table below.

QC for SNPs QC for subjects

SNPs were dropped for:

bad probe mapping to the genome

MAF < 0.005

HWE < 1x10-8

missingness > 0.05

MAF deviation from HapMap3 CEU founders

Subjects were dropped for:

affy6 “contrast QC” < 0.40

missingness > 0.05

outlying genome-wide homozygosity

sex discrepancy

incorrect zygosity

ancestry outlier

excessive relatedness

SNPs. We obtained all SNP probe sequences from the Affymetrix website and used bowtie13 to map the probes against NCBI Build 37/UCSC hg19. The probes for 8,352 SNPs mapped badly (to a “random” region, to >1 region, or to 0 regions) and were removed.

The minor allele frequency (MAF) threshold was selected empirically by capitalizing on the availability of a large number of pairs of identical twins. We assumed that the dominant process leading to genotype disagreement was genotyping error. We used 800 pairs of genetically verified monozygotic twins. We classified autosomal SNPs into 50 MAF bins, each containing ~15,000 SNPs. We computed the crude agreement of MZ pairs within each MAF bin in the figure below (left panel). The crude agreement begins high, declines and then recovers and stabilizes. The initially high agreement does not account for chance, the proportion of “agreement” due to chance coincident appearances of the major allele when the MAF is low. After correction for this chance coincidence, the data support an empirical MAF threshold of 0.005 (red point) and above as achieving an estimated r2 >0.9 between the true genotype and observed genotype. Thus the expression-QTL analyses use 0.005 as a lower MAF threshold, with additional MAF-specific and imputation QC thresholds as described below and in Online Methods.

Intra-pair MZ SNP crude agreement based on n=690 MZ pairs.

MAF in HapMap3 CEU founders vs. present sample

0.99650

0.99700

0.99750

0.99800

0.99850

0.99900

0.99950

1.00000

0.0001 0.001 0.01 0.1 1

Nature Genetics: doi:10.1038/ng.2951

Page 59: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 59 of 75

The figure above (right panel) is a scatterplot of the SNP MAF for HapMap3 CEU founders1 versus one randomly selected person per family. The CEU founders are of northwest European ancestry, and HapMap3 was genotyped with the affy6 and Illumina 1M chips. Each point represents a single SNP, and the contours show point density. The red dots are 3,225 SNPs with significantly different (p < 5x10-8) MAF between CEU founders (ancestry northwestern Europe) and these subjects from the Netherlands. With few exceptions, these differences were SNPs that were rare or absent in the ~60 CEU founders and uncommon in the present sample. SNPs with MAF > 0.05 that were significantly different were removed.

The thresholds for departure from Hardy-Weinberg Equilibrium (pHWE < 1x10-8) and missingness (> 0.05) were based on a randomly selected twin per pair.

Subjects. The contrast QC threshold (0.4) is the value recommended by Affymetrix. The missingness threshold (0.05) is typical. Subjects with unusually high or low genome-wide homozygosity can reflect multiple genetic issues (relative inbreeding) and technical issues (poor chip performance, sample contamination), and subjects more than 6 standard deviations from the sample mean were removed. Subjects with an unresolvable mismatch between phenotype sex and chrX/chrY genotyping were removed, as the connection between phenotype and genotype vectors was questionable. Similarly, we confirmed the expected MZ and DZ relationships using genome-wide data and IBD proportion estimates described below. We evaluated empirical ancestry using an LD pruned set of autosomal SNPs. Subjects who were marked outliers were removed as were subjects who were related to more than 6 individuals (as excessive relatedness can indicate poor genotyping performance).

Zygosity

We used an LD pruned set of autosomal SNP data to confirm zygosity via evaluation of the proportions of alleles that shared between two putative twins. For a single genotype, a twin pair could share 0, 1, or 2 alleles and, over a genome-wide set of SNPs, these proportions

are z0, z1, and z2 (respectively). Further, “pi-hat” (

) summarizes the relationship

between two individuals (expectations are 1 for MZ twins and ½ for first-degree relatives such as DZ twins).

The figure below (left panel) shows that all MZ twins have values of close to 1.0 and that for DZ twins centers on 0.5. The right panel indicates that all MZ twin pairs are centered on z1=0 and z2=1 (as expected) and that all DZ twins with ~ ½ are sibling pairs centered on

z1=0.5 and z2=0.25. The empirically estimated among DZ twins followed the expected distribution, as described in the main manuscript.

Nature Genetics: doi:10.1038/ng.2951

Page 60: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 60 of 75

Ancestry

We applied smartpca to an LD pruned SNP subset genotyped in a randomly selected individual per family. These results were compared to HapMap3 samples in order to contextualize the results. The figure below depicts sample ancestry. For clarity, the median PC1 and PC2 values of the ten HapMap3 population groups are shown with a cross. The three samples of African ancestry cluster at the upper right (ASW, LWK, YRI – African-Americans from the Southwest USA; Luhya from Webuye, Kenya; and Yoruba from Ibadan, Nigeria). The three samples with East Asian ancestry cluster in the lower right (CHB, CHD, and JPT – Han Chinese from Bejing; Chinese from Denver; and Japanese from Tokyo). MEX refers to Mexican-Americans from Los Angeles, and GIH to Gujarati Indians in Houston, Texas.

The present samples are colored light sea green with open circles showing potential outliers. The vast majority of NTR samples cluster tightly with the CEU group (Utah residents with Northern and Western European ancestry) and near the other European group (TSI, Tuscan Italians). There are 45 twin pairs who are potential outliers, generally with East Asian ancestry, perhaps consistent with the historical Dutch colonial presence in Indonesia and other East Asian countries.

Pi-hat by zygosity Plot of z2 by z1

MZM=monozygotic male, MZF=monozygotic female, DZM=dizygotic male, DZF=dizygotic female, and DOS=dizygotic opposite sex twin pairs.

Nature Genetics: doi:10.1038/ng.2951

Page 61: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 61 of 75

Ancestry plot, PC2 x PC2

SNPs and subjects after QC

Agreement. Some of these samples had prior genome-wide genotyping with a legacy Perlegen four-chip platform as part of the GAIN14 major depressive disorder GWAS.15 Comparing genotypes for 2219 subjects and 110,588 SNPs that passed QC in the current as well as the prior project, the agreement was 0.99956.

SNPs and subjects. Because of the correlated nature of MZ and DZ twins, we selected one twin per pair at random in order to evaluate the performance of 686,895 affy6 SNPs. The figure below displays the characteristics of an independent subset of the analytic data set.

Note that a few values exceed the thresholds used overall in this article, as these criteria were applied and evaluated in a different portion of the data (e.g., minimum MAF of 0.002 instead of 0.005).

In general, there are multiple indications that these data are of high quality, in particular that subject and SNP missingness are very low compared to other affy6 datasets with medians under 0.005 for both.

Nature Genetics: doi:10.1038/ng.2951

Page 62: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 62 of 75

SNP minor allele frequencies (MAF)

SNP missingness proportions

SNP PHWE

Subject missingness proportions

Nature Genetics: doi:10.1038/ng.2951

Page 63: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 63 of 75

Statistical analysis

Transcript heritability

The primary heritability testing was on (i) the basis of comparisons of correlation amongst MZ twins to that of DZ twins, following the classical twin design. We refer to this approach as the “twin-based” approach, to distinguish it from two other approaches: (ii) using DZ twins alone, with expression phenotype correlation compared to the empirical IBD proportions, as described for full sibs (equivalent for this purpose to DZ twins) in reference 16; (iii) using one half of the twinships, i.e., a set of unrelated individuals, and the GCTA approach based on the small variation in IBD among all such pairs. 17 These three approaches use entirely different portions of the data.

Twin-based transcript heritability

In order to effectively handle covariates, as well as to provide a seamless connection between the twin heritability and eQTL analyses, we developed our own algorithms and software (R v2.15) for twin-based heritability analysis. We estimated the twin-based ACE model heritability of each transcript by fitting the regression model , where is the expression value of individual ( =1, … , n), is the overall mean, are covariates as described earlier, and , , and denote the (random) additive genetic, shared environmental, and residual effects. We further assume that the three random terms

are mutually independent and normally distributed with mean 0 and variances ,

, and .

According to genetic theory,18 for a pair of subjects and , we have

( ) {

}, and

( ) {

}.

Let { }, { }, ,

, , (i.e.,

an identity matrix), then where

.

We can re-express as where with

⁄ and

⁄ .

For parameter estimates, we maximize the logarithm of the profile restricted maximum likelihood (REML) function:19

| | | |

where { } and is the rank of . The REML considers the loss in degrees of freedom associated with the fixed effect estimates. Therefore, REML estimates tend to be less biased compared to their corresponding maximum likelihood estimates and control type I error better. The profile function has only three parameters regardless of the number of fixed effects, and thus maximizing over the profile REML function is computationally more efficient compared to maximizing over the full REML function. For each transcript, the twin-based heritability and shared environmental effects are estimated as

and

.

Testing of the REML code proceeded by implementing both REML and standard maximum likelihood estimation (MLE). Simulations were conducted with sample sizes similar to NTR

under the null hypothesis ( ), for 500 simulations with independent random normal

Nature Genetics: doi:10.1038/ng.2951

Page 64: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 64 of 75

covariates. The figure below shows observed vs. expected p-values for these simulations for both MLE and REML. With this number of covariates, the MLE was noticeably anticonservative, especially for a large number of covariates, while the REML code provided approximately uniform P-values, even for up to 200 covariates.

Qqplots for P-values of ML vs. REML model fitting of twin-based heritability, 500 null simulations based on 700 MZ pairs, 650 DZ pairs.

Genomewide heritability analysis using DZ twins

Traditional heritability analysis based on MZ vs. DZ contrasts essentially assumes that the IBD proportion among DZ twins is 0.5. In fact, just as with non-twin full sibs, this proportion falls in a wide range, from about 0.35 to 0.65 in our dataset, as estimated by PLINK. The variation in this proportion, along with varying correlation in transcript levels, provides

Nature Genetics: doi:10.1038/ng.2951

Page 65: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 65 of 75

another estimate of heritability, relatively free of assumptions of the nature of family effects. 16 Here we used PLINK to estimate the IBD proportion for each DZ pair which had both genotype and expression data. These proportions had mean 0.501 and standard deviation 0.038, an extremely close match to theoretical expectations. We fit the ACE model , where are the fixed effect covariates with coefficients ,

, where A is the estimated IBD proportion matrix among all DZ individuals, assuming IBD=0 for

unrelated twins, and , and C is as defined for twin-based heritability.

Maximization was performed using REML, and P-values obtained by likelihood ratios.

GCTA heritability analysis

For each unrelated NTR twin set (twin set 1 and twin set 2), SNP-sense heritability of transcripts was performed by GCTA (Genome-wide Complex Trait Analysis, version 0.93.8), 17 which was initially designed for estimating the proportion of phenotypic variance explained by genome-wide SNPs for complex traits. Instead of using disease complex traits as phenotype data, we used the expression level of each transcript among all the individuals as phenotype and used the SNPs as genotype input. An additional investigation of local heritability was conducted by also identifying, for each transcript, SNPs within 1 Mb upstream of the TSS and 1 Mb downstream of the TES as local SNPs (the same definitions used for local eQTLs). We also investigated distant heritability effects by considering the trans-SNPs, i.e., the complementary SNPs to the local SNPs. All gene expression covariates described earlier were included in the analysis.

Three steps are involved in heritability analysis by GCTA: (1) prepare the genotype input files for GCTA for each set of SNPs; (2) estimate the genetic relationship matrix (GRM) of all the possible paired individuals using genome-wide SNPs, cis-SNPs or trans-SNPs; and (3) estimate the heritability explained by SNPs using pre-computed GRM from step 2. For estimating heritability explained by genome-wide SNPs, the single genome-wide GRM can be used for all the transcripts in each NTR set. However for estimating heritability explained by local- or distant-SNPs, transcript-specific GRMs must be computed. PLINK was used to extract local- and distant-SNPs and for preparing the genotype input file for GCTA. GCTA was then used repeatedly for each transcript in each group of SNPs (genome-wide, local or distant). The standard REML GCTA output was used for estimation of heritability among these unrelated individuals. As might be expected, the distant GCTA heritability estimates were almost identical to the genome-wide estimates, as the distant-GRM is nearly identical to the genome-wide GRM.

Estimation of “true” underlying heritability, and comparison with published reports

The gamma model for true heritability and local IBD proportion shown in Figure 3 was fit using a simple moment method as follows. The estimated standard error (SE) for the h2 REML estimates was examined, and did not vary markedly across the range of estimate h2. Thus the median standard error was used as an SE for all genes in the simple model

estimated h2 = true h2+error

where the errors were assumed to follow N(0,SE2). Note that our use of an unconstrained REML model (allowing negative estimated heritability) ensures that the overall average of the estimated heritability is the same as that of the true heritability.

The sample mean and variance of the estimated h2 across the 18,392 genes then

determined the shape and scale parameters (k,) of the fitted gamma density. Specifically,

mean(estimated h2)= k and var(estimated h2)-SE2= k. A similar maximum likelihood model in which a population spike at true h2=0 did not produce a significantly superior fit to the observed data. Model fit was assessed by randomly simulating 1 million random draws from

Nature Genetics: doi:10.1038/ng.2951

Page 66: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 66 of 75

the estimated true distribution with variance SE2. Essentially the same approach was used to fit the underlying gamma density for the true local IBD explained proportion of h2.

The MuTHER report 20 used the same MZ vs. DZ twin design, with MZ:DZ ratios that were close enough to those of NTR that we chose a simple approximation to mimic the MuTHER results. First, we fit the gamma model for h2 as described above. Then we assumed the standard errors for MuTHER estimation were proportionally greater according to the sample

size ratio, i.e. greater by a proportion √ , where the ratio is obtained from the

sample sizes in the two studies. 1 million random draws from the gamma model+error were performed, resulting in the MuTHER-like inflated estimated shown in Figures 3c and 3d of the main article. The MuTHER definitions of expressed genes resulted in nearly the same proportion of declared expressed genes as in the current report.

Meaningful comparison with the Brisbane Systems Genetics Study (BSGS) 21 required additional steps in order to make the results interpretable. The BSGS reported the results from an AE model (i.e., no c2 term) due to the authors’ judgment that the sample size was insufficient for an ACE model. The omission was justified on the basis that a separate model fit for c2 identified few significant genes, which is consistent with our report. Fitting the AE model is more powerful to detect heritability than the ACE model, assuming it can be justified, as it eliminates a “soft” confounding between h2 and c2 effects. To approximate the variability that should be expected for the variance-components BSGS estimation, we note

that under the AE model, , where the terms are the phenotype (expression) correlations within each twin set. Thus a nearly efficient heritability estimate is

, where is the optimal weight determined in a meta-analysis, by the standard errors for intra-class correlations (see below) for each of the twin types (MZ and DZ). Following standard results from correlations computed from bivariate normal data, we

have √ , where n is the number of pairs within the respective twin type.

Using this model, we randomly drew 1 million heritability values from the gamma model, and

computed the standard error for according to the model, assuming nDZ=206,nMZ=78 as in

the BSGS, and produced random values with standard deviation equal to SE2.This was the basis of the “BSGS-like” estimated h2 distribution shown in Supplementary Figure 4a.

In summary, we did the following: 1) for our study, we deconvolved the effect of sampling variation on heritability, uncovering the true underlying heritability distribution for our data. Then, we 2) added sampling variation to the true heritability, with variation consistent with the sample sizes and design/analyses of each of the MuTHER and BSGS studies. The results

of 2) yielded distributions that were strikingly similar to the reported distributions for each of MuTHER and BSGS.

Intraclass correlation (ICC) and pathway analyses

It has been repeatedly established 22 that expression pathway analysis methods which do not account for gene-gene correlation can vastly overstate statistical significance and incur very high false positive rates. 23 Alternative methods based on permutation of arrays versus clinical variables 24,25 automatically control false positives under the complete null hypothesis that no gene is significant, and are otherwise somewhat conservative. 22

For a single gene, we begin by considering the classical intra-class correlation estimate for the nMZ pairs of MZ twins, with {yMZ1,j yMZ2,j} signifying the paired expression values for the

arbitrarily ordered pair,, where 1 2

2 2 2

, ,

1 1

1( ) ( )

2

MZ MZn n

MZ MZ j MZ MZ j MZ

j jMZ

y y y yn

.

Nature Genetics: doi:10.1038/ng.2951

Page 67: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 67 of 75

A similar calculation is performed for ICCDZ, and the final heritability estimate is 2ˆ 2( )ICC MZ DZh ICC ICC .

To be most comparable to the REML-based analyses (see Heritability Methods), we first compute each y value as a covariate-adjusted residual. For the observed data, the REML-

based estimates and ICC-based estimates were highly comparable, with correlation in excess of 0.99 (see figure below, left panel). Minor differences arise from the precise handling of covariates, and the fact that singletons provide a modest amount of information to the REML approach, but are not used in the ICC-based approach.

REML-based & ICC-based estimates are highly comparable (n=2616, r=0.996)

Comparison of ICC methods (n=2616)

For the 2,616 individuals in complete pairs, 1,000 permutations of twin status (MZ vs. DZ) were performed, and the ICC-based heritability estimates performed for all 19,296 unique genes in the “best-h2” set. By permuting twin-type only, the familial correlations, average gene expression level, and correlations among genes were all maintained. As described in the main Methods section, for dichomotous predictors (genes belonging/not belonging to a pathway), for each permutation the mean heritability within the pathway was compared to the heritability in the complementary set of genes. Applying a normal assumption to this mean resulted in the “enrichment z” as reported, with P-values based on departure from zero. A similar approach was used for continuous predictors, except that the correlation coefficient between h2 and the predictor was instead of the mean difference. Correction by mean expression was performed by computing residuals for h2 vs. mean expression, in the actual data as well as all permutations. After obtaining p-values for each pathway in KEGG, GO Biological Processes, GO Molecular Function, and GO Cellular Components, Benjamini-Hochberg false discovery-rate q-values and Bonferroni-corrected p-values were computed

using p.adjust in R v2.14.

Nature Genetics: doi:10.1038/ng.2951

Page 68: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 68 of 75

Additional heritability quality control

Although our expression and SNP genotype QC measures were already extensive, as a final heritability quality-control step, we also examined the results for an additional reduced set of individuals based on reduced-quality genotypes or deviation from expected IBD proportions. A total of 77 individuals were identified in a number of categories, including lowered quality genotypes (among the already high-quality set) and moderately extreme DZ IBD proportions (although the results already matched theoretical expectations as shown in Figure 3). Using the ICC-based estimates described above applied on a reduced set of covariates, not including the blood count covariates or related SNPs, the results for excluding these 77 individuals appear in the figure above, right panel. The results for the full and reduced sets were highly concordant, as some variation is to be expected merely based on the reduced sample size.

Additional examination of age as a covariate.

Comparison with other studies, such as the MuTHER twin heritability report (Ref. 20) is potentially complicated by differences in the age distribution of participants across the studies. We note that the NTR participants in this study had a wide range of ages (18-78 years), so that age could be effectively used as a covariate, as shown in the figure below, left panel.

Histogram of NTR participant ages (n=2752). REML heritability estimates (n=2752), with or without age as a covariate (r=0.999)

We performed an additional REML analysis of heritability, using all of the covariates described above for the primary analysis, except that age was no longer used as a covariate. The figure above (right panel) shows that whether or not age is used as a covariate had little effect on the REML heritability estimates (r=0.999), suggesting that ranges of variation in age are not a major source of variation in h2 estimates, for the platform and tissue described in this study. Results for c2 are similar (r=0.999 for estimates with/without age as a covariate, not shown.)

Nature Genetics: doi:10.1038/ng.2951

Page 69: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 69 of 75

eQTL analysis

eQTL analysis of all NTR sets was first performed in screening steps by Matrix eQTL 26 (http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL), which treats local (“cis”) and distant (“trans”) eQTLs separately for multiple comparisons. The local eQTLs are SNPs defined as transcript-associated SNPs between 1 Mb upstream of transcript start site (TSS) and 1 Mb downstream of transcript end site (TES). The distant eQTLs are the complementary set of SNPs. In the screening step, the entire dataset (2486 individuals with genotypes) was screened using Matrix eQTL as if the individuals were all unrelated. Benjamini-Hochberg q-value estimation was performed separately for local and distant eQTLs. For the desired q-value threshold (e.g., q<0.01 as shown in Figure 5), all SNP-transcript pairs achieving nominal significance were then re-analyzed using the full REML mixed model code as described for twin-contrast heritability, in which all covariates and the current SNP of interest were included. In addition, 3 genotype principal components were used, in order to control for population stratification. The Wald statistic for the current SNP reflects the full adjustment for covariates and for the twin heritability structure, which for eQTL analysis might be described as polygenic effects relative to the current SNP. As the correct mixed model tended to be less significant than the Matrix eQTL screening result, the approach captures all the results that were truly significant according to the threshold.

We initially identified 601 distant eQTLs (q-value < 0.001) involved with 581 genes and 538 SNPs. Of these 601 distant eQTLs, 266 (44%) replicated in NESDA study with q-value < 0.1. We applied four criteria to remove likely false positives. Each criterion was coded as 0 or 1, indicating a distant eQTL passing/failing a QC criterion: (1) failld=1 if the eQTL SNP and eQTL gene are located within the same LD bins or their distance is barely above 1Mb (< 1.06Mb). LD bins contained contiguous SNPs with r2 > 0.5 as determined by evaluating all pairwise associations for genotyped Affymetrix 6.0 SNPs; (2) failbuddy=1 if the distant eQTL SNP was the lone significant association in that genomic region (i.e., all other 1000 Genomes imputed SNPs ±50kb of the index SNP had q-values ≥ 0.01). failbuddy=1 implies that the distant eQTL SNP is not correlated well with nearby SNPs and given the high density SNPs after 1000 Genomes imputation, this is very likely due to SNP genotype quality problem; (3) failcrosshyb=1 if the gene expression array probe had in silico evidence of potential to cross-hybridize with the DNA sequences around the SNP probe. Specifically, we mapped the gene expression probe sequence to the human reference genome using bowtie2 (parameters --local -a -N 1 -L 18 -i S,1,0.20, following previous studies. 27 The purpose of these parameters was to balance the sensitivity and specificity to detect possible cross-hybridization with reasonable computational cost . We declared a cross-hybridization if any probe sequence of an eQTL gene was mapped to a location within 2Mb of the corresponding distant eQTL SNP; (4) faillocalsnp=0 if adjusted distant eQTL p-value

(adjusted by a local SNP of the eQTL gene) is < 1e-10 or < 10 × unadjusted p-value. The motivation of this criterion is that if due to some reason (e.g., LD, annotation inaccuracy, repetitive region etc.), a distant eQTL SNP is correlated with a local eQTL SNP, then that distant eQTL is a shadow of the local eQTL and hence should be removed.

The following table shows the number of eQTL removed by each criterion, and the percentage that can be replicated in NESDA study. Only 7.7% of distant eQTL excluded by the failbuddy criterion can be replicated, justifying the efficacy of this criterion. The eQTL excluded by the other three criterion can be well replicated in NESDA, which is expected since both NTR and NESDA studies employ the same array platforms and the individuals are from the same population and thus may share similar LD structure.

Nature Genetics: doi:10.1038/ng.2951

Page 70: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 70 of 75

category failld failbuddy failcrosshyb faillocalsnp

# of distant

eQTLs removed 46 104 108 63

# replicated (%)

in NESDA 31 (67.4%) 8 (7.7%) 69 (63.9%) 37 (58.7%)

The following table shows the frequency of the eQTLs excluded by all the combinations of the all the QC criterion and 353 of the 601 eQTLs pass all four QC criterion.

failld failbuddy failcrosshyb faillocalsnp frequency

0 0 0 0 353

0 0 0 1 13

0 0 1 0 84

0 0 1 1 1

0 1 0 0 89

0 1 0 1 5

0 1 1 0 10

1 0 0 0 1

1 0 0 1 32

1 0 1 0 1

1 0 1 1 12

We further checked the scatter plot of the 353 distant eQTLs (gene expression vs. SNP genotype before and after adjusting for all the covariates) and remove 5 problematic eQTL: there is no eQTL association before adjusting for covariates and after adjusting for covariates, the eQTL association is totally due to one outlier sample. Therefore 348 distant eQTL pass all the QC and manual examination of scatter plots. Among them 165 (47.4%) were replicated in NESDA study.

We have also checked whether any of those 583 SNPs involved in the 601 distant eQTL are located in repetitive regions. Among those 583 SNPs, 321(60%) overlap with at least one of the following repetitive and potentially problematic regions: segmental duplications (UCSC table genomicSuperDups), repeat masker regions and nested repeats (nestedRepeats, rmsk), simple repeats (simpleRepeat), and short tandem repeats (stsMap). We randomly selected 10,000 SNPs genome-wide that matched to the 583 SNPs by MAF and imputation R2, and among them 6079 (61%) overlap with at least one of those repetitive regions. Thus, there was no significant enrichment. Among the 348 eQTLs that pass QC, 207 (71%) are involved with SNPs located in repetitive regions, and among that 253 eQTLs that do not pass the QCs 147 (58%) are involved with SNPs located in repetitive regions

Variant effect predictions were performed using Variant Effect Predictor (version 2.8) 28.. Network analyses were performed using partial correlations as estimated by a penalized estimation method. 29

Nature Genetics: doi:10.1038/ng.2951

Page 71: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 71 of 75

Comparison of number of genes identified in eQTL studies

The table below lists the values and the source publications described in Figure 4a and Figure 4c.

First author Sample Size local eQTL genes distant eQTL genes Reference

Choy_CEU 53 53 22 30

Choy_CHB_JPT 66 21 10 30

Choy_YRI 51 9 7 30

Fehrmann_PBL 1469 5695 138 27

Montgomery2_CEU 107 281 0 31

Price_CEU 56 24 8 32

Price_YRI 56 31 10 32

Spielman_CHB_JPT 75 93 15 33

Stranger_CEU 56 167 19 34

Stranger_CHB_JPT 85 731 33 34

Stranger_YRI 57 231 23 34

StrangerPG_ASN 156 1131 18 35

StrangerPG_GIH 82 283 0 35

StrangerPG_LWK 83 262 1 35

StrangerPG_YRI 108 315 2 35

Zeller 1490 4512 602 36

Grundberg_CEU 856 4625 121 20

Comparison with the eQTL results from Westra et al. (2013)

The local and distant eQTL results of Westra et al. (Nature genetics 45.10 (2013): 1238-1243) at FDR threshold 0.5 were extracted from files “2012-12-21-CisAssociationsProbeLevelFDR0.5.zip”, and “2012-12-21-TransEQTLsFDR0.5.zip”, respectively, downloaded from http://genenetwork.nl/bloodeqtlbrowser/ on Dec. 22, 2013. As we used the downloaded results rather than original data, we note the following differences between NTR/NESDA and Westra et al.

1. Our results used 1000 Genomes genotypes as a reference population, while and Westra et al. used genotype data imputed using HapMap reference. The SNPs used in our study cover most of the SNPs used in Westra et al.

2. We used AffyU219 arrays and Westra et al. used several Illumina platforms. 3. Our eQTL results are “non-redundant”, in the sense that one probe has at most one

local eQTL and at most one distant eQTL per chromosome. In contrast, the eQTL results Westra et al. kept all the significant associations between all SNPs and all probes. Therefore one eQTL may be represented several times by the SNPs with strong LD.

4. For distant eQTL mapping, we used genome-wide SNPs, while Westra et al. only examined 4542 SNPs implicated in GWAS findings (Catalog of Published GWAS (16 July 2011), http://www. genome.gov/gwastudies/.)

Nature Genetics: doi:10.1038/ng.2951

Page 72: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 72 of 75

We compared our eQTL results (from NTR) with Westra et al. by two different approaches. First, we ask whether eSNPs identified from our study are close to the eSNPs reported Westra et al. Among the 9543 local eSNPs (q < 0.01) identified in our study, 6581 (69%) are within 5kp of any local eSNPs (FDR < 0.1) reported by Westra et al., see the following table for

overlaps using larger or smaller flanking windows.

Replication of local eSNPs identified by this study in Westra et al.

Flanking window size 2x100bp 2x500bp 2x1kb 2x5kb 2x10kb

# of replicated eQTL 1951 3368 4441 6581 6998

Proportion of replicated eQTL 0.20 0.35 0.47 0.69 0.73

Proportion of the genome covered by

the union of the flanking windows

6.4E-04 3.2E-03 6.3E-03 3.0E-02 5.8E-02

Among the 348 distant eSNPs (q < 0.001) identified in our study our results, 17 are within 5kb of any 4542 GWAS SNPs, and among them, 5 (29%) are replicated, see the following table for overlaps using larger or smaller flanking windows.

Replication of distant eSNPs identified by this study in Westra et al.

Flanking window size 2x100bp 2x500bp 2x1kb 2x5kb 2x10kb

# of replicated eQTL 1 1 1 5 6

# of GWAS SNPs covered by the

union of the flanking windows

1 1 3 17 25

Proportion of replicated eQTL 1 1 0.33 0.29 0.24

Proportion of the genome covered by

the union of the flanking windows

2.0E-05 1.0E-04 2.0E-04 9.3E-04 1.8E-03

Next, we selected eQTLs as unique (SNP, gene) pairs reported by Westra et al. at FDR cutoff 0.05, and check the eQTL p-values using our data (NTR or NESDA). We match genes between these two datasets by gene symbols, and use the probe with largest heritability as surrogate to a gene in our study. Using such p-value distributions, we can calculate the proportion of p-values from Null (uniform distribution) and alternative.

The 664097 local eQTLs reported by Westra et al. correspond to 590273 unique (SNP, gene) pairs, and after excluding entries that have no match in our data, we obtain 444570 and 444479 (SNP, gene) pairs in NTR and NESDA, respectively.

Nature Genetics: doi:10.1038/ng.2951

Page 73: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 73 of 75

The 1513 distant eQTLs reported by Westra et al. correspond to 1423 unique (SNP, gene) pairs , and after excluding entries that have no match in our data, we obtain 827 and 597 (SNP, gene) pairs in NTR and NESDA, respectively.

The proportions of p-values from alternative distribution were estimated using the R qvalue function.

NTR NESDA

local eQTL 0.596 0.597

distant eQTL 0.231 0.230

References

1. Altshuler, D.M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52-8 (2010).

2. Durbin, R.M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061-73 (2010).

3. Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3, 1724-35 (2007).

4. Conover, W.J. Practical Nonparametric Statistics, (Wiley, NY, 1999).

5. Purcell, S. et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics 81, 559-75 (2007).

6. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904-9 (2006).

7. SAS Institute Inc. SAS/STAT® Software: Version 9.2, (SAS Institute, Inc., Cary, NC, 2008).

8. R Development Core Team. R: A Language and Environment for Statistical Computing, (R Foundation for Statistical Computing, Vienna, Austria, 2011).

9. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome research 19, 1639-45 (2009).

Nature Genetics: doi:10.1038/ng.2951

Page 74: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 74 of 75

10. Schizophrenia Psychiatric Genome-Wide Association Study Consortium. Genome-wide association study identifies five new schizophrenia loci. Nature Genetics 43, 969-76 (2011).

11. Neale, B.M. & Purcell, S. The positives, protocols, and perils of genome-wide association. Am J Med Genet B Neuropsychiatr Genet 147B, 1288-94 (2008).

12. de Bakker, P.I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17, R122-8 (2008).

13. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25 (2009).

14. Manolio, T.A. et al. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat Genet 39, 1045-1051 (2007).

15. Sullivan, P. et al. Genomewide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Molecular Psychiatry 14, 359-75 (2009).

16. Visscher, P.M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS genetics 2, e41 (2006).

17. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. American journal of human genetics 88, 76-82 (2011).

18. Falconer, D.S. & Mackay, T.F.C. Introduction to Quantitative Genetics, (Longman Group Ltd., London, 1996).

19. Gilmour, A., Thompson, R. & Cullis, B. Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440-50 (1995).

20. Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nature genetics (2012).

21. Powell, J.E. et al. Congruence of additive and non-additive effects on gene expression estimated from pedigree and SNP data. PLoS genetics 9, e1003502 (2013).

22. Barry, W.T., Nobel, A.B. & Wright, F.A. A statistical framework for testing functional categories in microarray data. Annals of Applied Statistics 2, 286-315 (2008).

23. Gatti, D.M., Barry, W.T., Nobel, A.B., Rusyn, I. & Wright, F.A. Heading down the wrong pathway: on the influence of correlation within gene sets. BMC genomics 11, 574 (2010).

24. Barry, W.T., Nobel, A.B. & Wright, F.A. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21, 1943-9 (2005).

Nature Genetics: doi:10.1038/ng.2951

Page 75: SUPPLEMENTARY INFORMATION FOR - Nature Research · 2014-04-28 · 1 SUPPLEMENTARY INFORMATION FOR HERITABILITY AND GENOMICS OF GENE EXPRESSION IN PERIPHERAL BLOOD Fred A 4Wright1,2,12,

Page 75 of 75

25. Mootha, V.K. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267-73 (2003).

26. Shabalin, A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353-8 (2012).

27. Fehrmann, R.S. et al. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS genetics 7, e1002197 (2011).

28. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069-70 (2010).

29. Sun, W., Ibrahim, J.G. & Zou, F. Genomewide multiple-loci mapping in experimental crosses by iterative adaptive penalized regression. Genetics 185, 349-59 (2010).

30. Choy, E. et al. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet 4, e1000287 (2008).

31. Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773-7 (2010).

32. Price, A.L. et al. Effects of cis and trans genetic ancestry on gene expression in African Americans. PLoS genetics 4, e1000294 (2008).

33. Spielman, R.S. et al. Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39, 226-31 (2007).

34. Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848-53 (2007).

35. Stranger, B.E. et al. Patterns of cis regulatory variation in diverse human populations. PLoS genetics 8, e1002639 (2012).

36. Zeller, T. et al. Genetics and beyond--the transcriptome of human monocytes and disease susceptibility. PloS one 5, e10693 (2010).

Nature Genetics: doi:10.1038/ng.2951