type 2 diabetes genetic loci informed by multi-trait ... · • in up to 17,365 individuals with...

23
RESEARCH ARTICLE Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis Miriam S. Udler ID 1,2,3,4 , Jaegil Kim 3 , Marcin von Grotthuss 3 , Sı ´lvia Bonàs-Guarch 5 , Joanne B. Cole ID 2,3 , Joshua Chiou ID 6 , Christopher D. Anderson on behalf of METASTROKE and the ISGC 2,3 , Michael Boehnke 7 , Markku Laakso 8 , Gil Atzmon 9,10,11 , Benjamin Glaser 12 , Josep M. Mercader ID 1,2,3,5 , Kyle Gaulton 6 , Jason Flannick 3,13 , Gad Getz 3 , Jose C. Florez ID 1,2,3,4 * 1 Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America, 2 Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America, 3 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 4 Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America, 5 Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona, Spain, 6 Department of Pediatrics, University of California San Diego, San Diego, California, United States of America, 7 Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America, 8 Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland, 9 Faculty of Natural Sciences, University of Haifa, Haifa, Israel, 10 Department of Medicine; Albert Einstein College of Medicine, Bronx, New York, United States of America, 11 Department of Genetics, Institute for Aging Research, Albert Einstein College of Medicine, Bronx, New York, United States of America, 12 Endocrinology and Metabolism Service, Hadassah-Hebrew University Medical Center, Jerusalem, Israel, 13 Department of Genetics, Boston Children’s Hospital, Boston, Massachusetts, United States of America * [email protected] Abstract Background Type 2 diabetes (T2D) is a heterogeneous disease for which (1) disease-causing pathways are incompletely understood and (2) subclassification may improve patient management. Unlike other biomarkers, germline genetic markers do not change with disease progression or treatment. In this paper, we test whether a germline genetic approach informed by physi- ology can be used to deconstruct T2D heterogeneity. First, we aimed to categorize genetic loci into groups representing likely disease mechanistic pathways. Second, we asked whether the novel clusters of genetic loci we identified have any broad clinical consequence, as assessed in four separate subsets of individuals with T2D. Methods and findings In an effort to identify mechanistic pathways driven by established T2D genetic loci, we applied Bayesian nonnegative matrix factorization (bNMF) clustering to genome-wide asso- ciation study (GWAS) results for 94 independent T2D genetic variants and 47 diabetes- related traits. We identified five robust clusters of T2D loci and traits, each with distinct tis- sue-specific enhancer enrichment based on analysis of epigenomic data from 28 cell types. PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 1 / 23 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Udler MS, Kim J, von Grotthuss M, Bonàs-Guarch S, Cole JB, Chiou J, et al. (2018) Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med 15 (9): e1002654. https://doi.org/10.1371/journal. pmed.1002654 Academic Editor: Claudia Langenberg, University of Cambridge, UNITED KINGDOM Received: May 8, 2018 Accepted: August 17, 2018 Published: September 21, 2018 Copyright: © 2018 Udler et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: GWAS summary statistics data are available at the website addresses provided in S2 Table. Data from METSIM (http://www.nationalbiobanks.fi/index. php/studies2/10-metsim), Ashkenazi (http://www. type2diabetesgenetics.org/projects/t2dGenes), Partners Biobank (https://biobank.partners.org/), and UK Biobank (http://www.ukbiobank.ac.uk/) studies are available from each study’s Institutional Data Access/Ethics Committee for researchers who meet the criteria for access to data.

Upload: nguyennguyet

Post on 12-Jul-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

RESEARCH ARTICLE

Type 2 diabetes genetic loci informed by

multi-trait associations point to disease

mechanisms and subtypes: A soft clustering

analysis

Miriam S. UdlerID1,2,3,4, Jaegil Kim3, Marcin von Grotthuss3, Sı́lvia Bonàs-Guarch5, Joanne

B. ColeID2,3, Joshua ChiouID

6, Christopher D. Anderson on behalf of METASTROKE and

the ISGC2,3, Michael Boehnke7, Markku Laakso8, Gil Atzmon9,10,11, Benjamin Glaser12,

Josep M. MercaderID1,2,3,5, Kyle Gaulton6, Jason Flannick3,13, Gad Getz3, Jose

C. FlorezID1,2,3,4*

1 Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America,

2 Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of

America, 3 Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America,

4 Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America,

5 Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational

Biology, Barcelona, Spain, 6 Department of Pediatrics, University of California San Diego, San Diego,

California, United States of America, 7 Department of Biostatistics and Center for Statistical Genetics,

University of Michigan, Ann Arbor, Michigan, United States of America, 8 Institute of Clinical Medicine,

Internal Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland, 9 Faculty of

Natural Sciences, University of Haifa, Haifa, Israel, 10 Department of Medicine; Albert Einstein College of

Medicine, Bronx, New York, United States of America, 11 Department of Genetics, Institute for Aging

Research, Albert Einstein College of Medicine, Bronx, New York, United States of America, 12 Endocrinology

and Metabolism Service, Hadassah-Hebrew University Medical Center, Jerusalem, Israel, 13 Department of

Genetics, Boston Children’s Hospital, Boston, Massachusetts, United States of America

* [email protected]

Abstract

Background

Type 2 diabetes (T2D) is a heterogeneous disease for which (1) disease-causing pathways

are incompletely understood and (2) subclassification may improve patient management.

Unlike other biomarkers, germline genetic markers do not change with disease progression

or treatment. In this paper, we test whether a germline genetic approach informed by physi-

ology can be used to deconstruct T2D heterogeneity. First, we aimed to categorize genetic

loci into groups representing likely disease mechanistic pathways. Second, we asked

whether the novel clusters of genetic loci we identified have any broad clinical consequence,

as assessed in four separate subsets of individuals with T2D.

Methods and findings

In an effort to identify mechanistic pathways driven by established T2D genetic loci, we

applied Bayesian nonnegative matrix factorization (bNMF) clustering to genome-wide asso-

ciation study (GWAS) results for 94 independent T2D genetic variants and 47 diabetes-

related traits. We identified five robust clusters of T2D loci and traits, each with distinct tis-

sue-specific enhancer enrichment based on analysis of epigenomic data from 28 cell types.

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 1 / 23

a1111111111

a1111111111

a1111111111

a1111111111

a1111111111

OPENACCESS

Citation: Udler MS, Kim J, von Grotthuss M,

Bonàs-Guarch S, Cole JB, Chiou J, et al. (2018)

Type 2 diabetes genetic loci informed by multi-trait

associations point to disease mechanisms and

subtypes: A soft clustering analysis. PLoS Med 15

(9): e1002654. https://doi.org/10.1371/journal.

pmed.1002654

Academic Editor: Claudia Langenberg, University

of Cambridge, UNITED KINGDOM

Received: May 8, 2018

Accepted: August 17, 2018

Published: September 21, 2018

Copyright: © 2018 Udler et al. This is an open

access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: GWAS summary

statistics data are available at the website

addresses provided in S2 Table. Data from

METSIM (http://www.nationalbiobanks.fi/index.

php/studies2/10-metsim), Ashkenazi (http://www.

type2diabetesgenetics.org/projects/t2dGenes),

Partners Biobank (https://biobank.partners.org/),

and UK Biobank (http://www.ukbiobank.ac.uk/)

studies are available from each study’s Institutional

Data Access/Ethics Committee for researchers who

meet the criteria for access to data.

Page 2: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

Two clusters contained variant-trait associations indicative of reduced beta cell function, dif-

fering from each other by high versus low proinsulin levels. The three other clusters displayed

features of insulin resistance: obesity mediated (high body mass index [BMI] and waist cir-

cumference [WC]), “lipodystrophy-like” fat distribution (low BMI, adiponectin, and high-den-

sity lipoprotein [HDL] cholesterol, and high triglycerides), and disrupted liver lipid metabolism

(low triglycerides). Increased cluster genetic risk scores were associated with distinct clinical

outcomes, including increased blood pressure, coronary artery disease (CAD), and stroke.

We evaluated the potential for clinical impact of these clusters in four studies containing indi-

viduals with T2D (Metabolic Syndrome in Men Study [METSIM], N = 487; Ashkenazi, N =

509; Partners Biobank, N = 2,065; UK Biobank [UKBB], N = 14,813). Individuals with T2D in

the top genetic risk score decile for each cluster reproducibly exhibited the predicted cluster-

associated phenotypes, with approximately 30% of all individuals assigned to just one cluster

top decile. Limitations of this study include that the genetic variants used in the cluster analy-

sis were restricted to those associated with T2D in populations of European ancestry.

Conclusion

Our approach identifies salient T2D genetically anchored and physiologically informed

pathways, and supports the use of genetics to deconstruct T2D heterogeneity. Classifica-

tion of patients by these genetic pathways may offer a step toward genetically informed T2D

patient management.

Author summary

Why was this study done?

• Clinical experience and physiological phenotyping suggest that type 2 diabetes (T2D) is

a heterogeneous disease.

• Recent studies leveraged phenotypic data to identify subtypes of individuals with T2D

but have not included genetic data as a driving component of the clustering process.

• Previous efforts to cluster T2D genetic loci used unsupervised hierarchical clustering, a

“hard clustering” method that restricts the membership of a given genetic locus to a sin-

gle cluster.

What did the researchers do and find?

• We applied a novel “soft clustering” method termed Bayesian nonnegative matrix fac-

torization to cluster variant-trait associations ascertained from publicly available

genome-wide association studies for 94 known T2D variants and 47 diabetes-related

traits.

• We identified five novel robust clusters of T2D loci, two related to insulin deficiency

and three related to insulin resistance, and showed that these clusters are differentially

enriched for relevant tissue-specific enhancers and promoters.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 2 / 23

Funding: MSU is supported by NIDDK

1K23DK114551-01. CDA is supported by NINDS

1K23NS086873-04. MB is supported by

DK062370. The funders had no role in study

design, data collection and analysis, decision to

publish, or preparation of the manuscript.

Competing interests: The authors have declared

that no competing interests exist.

Abbreviations: BMI, body mass index; bNMF,

Bayesian nonnegative matrix factorization; CAD,

coronary artery disease; CIR, corrected insulin

response; CKD, chronic kidney disease; DBP,

diastolic blood pressure; DI, disposition index;

DIAGRAMv3, DIAGRAM version 3; eGFR,

estimated glomerular filtration rate; EGG, Early

Growth Genetics; EMR, electronic medical record;

FI adj BMI, FI adjusted for BMI; GIANT, Genetic

Investigation of ANthrometric Traits; GRS, genetic

risk score; GWAS, genome-wide association study;

HbA1c, glycated hemoglobin; HC, hip

circumference; HDL, high-density lipoprotein; HLA,

human leukocyte antigen; HOMA-B, homeostatic

model assessment of beta cell function; HOMA-IR,

homeostatic model assessment of insulin

resistance; HRC, Haplotype Reference Consortium;

Incr30, incremental insulin response at 30 minutes

of OGTT; Ins30, insulin at 30 minutes of OGTT;

Ins30 adj BMI, Ins30 adjusted for BMI; ISI, insulin

sensitivity index; LDL, low-density lipoprotein;

MAGIC, Meta-Analyses of Glucose and Insulin-

related traits Consortium; METSIM, Metabolic

Syndrome in Men Study; NAFLD, nonalcoholic fatty

liver disease; NMF, nonnegative matrix

factorization; OGTT, oral glucose tolerance test;

RPDR, Research Patient Data Registry; SBP,

systolic blood pressure; SIDD, severe insulin-

deficient diabetes; SIRD, severe insulin-resistant

diabetes; T2D, type 2 diabetes; UACR, urine

albumin-creatinine ratio; UKBB, UK Biobank; WC,

waist circumference; WHR, waist-hip ratio; 2hrGlu

adj BMI, 2-hour glucose on oral glucose tolerance

test adjusted for body mass index.

Page 3: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

• In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-

als have a genetic risk score in the top decile of uniquely one cluster, and these individu-

als have cluster-specific phenotypic traits that distinguish them from others with T2D.

What do these findings mean?

• Soft clustering of genetic loci associated with T2D produced biologically supported

mechanistic pathways, illuminating how loci might impact T2D risk.

• Classification of individuals by these genetic pathways may offer a step toward geneti-

cally anchored but physiologically informed diagnosis, surveillance, and management of

patients with T2D.

Introduction

Type 2 diabetes (T2D) is a complex disease affecting the world’s population at epidemic rates

and whose pathophysiology remains incompletely understood. Approximately 30.3 million

(9.4%) people in the United States have diabetes, with T2D thought to account for 90%–95%

of all diagnoses [1,2]. Despite recognized heterogeneity in patient phenotypes and responses to

treatment, T2D management strategies remain largely impersonalized.

In an attempt to deconstruct the heterogeneity of T2D, recent studies have performed clus-

ter analysis of individuals using serum biomarkers and clinical data to identify T2D subgroups

[3,4]. These studies offer exciting directions for future research but are also limited by the

nature of the variables included in analyses. For example, Ahlqvist and colleagues [4] clustered

individuals using six variables measured shortly after diabetes diagnosis, including glycated

hemoglobin [HbA1c], glutamate decarboxylase antibodies, and body mass index (BMI); nota-

bly, these variables change with disease progression and treatment, and thus application of this

clustering approach to clinical practice is of uncertain utility when patients are evaluated at a

different time in the disease course or after treatment has been initiated. Additionally, it is not

clear whether clinical biomarkers used in cluster analyses to date are causal, consequential, or

coincidental in the disease process.

In contrast to other serum biomarkers, germline genetic variants associated with T2D are

more likely to point to T2D causal mechanisms and remain constant regardless of develop-

mental stage, disease state, or treatment. Over the past decade, genome-wide association stud-

ies (GWAS) and other large-scale genomic studies have identified over 100 loci associated with

T2D, causing modest increases in disease risk (odds ratios generally <1.2) [5–9]. These genetic

loci offer insight into biological pathways causing T2D, but for most of these loci, the causal

variant(s) and the mechanism by which the locus causes T2D remain unknown, limiting

opportunities for clinical translation.

As GWAS have been conducted across multiple traits, there exists an opportunity to lever-

age multi-variant-trait association patterns to elucidate likely shared disease mechanisms,

based on the assumption that genetic variants that act along a shared pathway will have similar

directional impact on various observed traits. For example, amongst genetic variants impact-

ing insulin resistance, Yaghootkar and colleagues identified a set of 11 variants associated with

a particular directional pattern of traits in GWAS. This set of 11 insulin resistance–increasing

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 3 / 23

Page 4: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

alleles was felt to represent a “lipodystrophy-like” fat distribution subgroup of insulin resis-

tance variants reminiscent of monogenic lipodystrophy, because they were associated with

increased fasting insulin and triglyceride levels but decreased high-density lipoprotein (HDL)

cholesterol, adiponectin, and BMI [10].

As with insulin resistance, T2D-associated genetic variants have been assessed using a simi-

lar multi-variant-trait clustering approach; however, the resultant clusters have had limited

clinical interpretability to date. Three efforts to perform clustering of T2D loci have been pub-

lished by Dimas and colleagues [11], focusing on glycemic traits, and recently by Scott and col-

leagues [6] and Mahajan and colleagues [9], both including BMI and lipid traits in addition to

glycemic traits. In these analyses, unsupervised hierarchical clustering was performed on T2D

variants using their associations with respective traits. While these approaches generated some

biologically suggestive clusters of genetic loci, determining the number and boundaries of clus-

ters using unsupervised hierarchical clustering remains rather subjective. Additionally, in

these analyses, many variants could not be clustered (more than half of all loci included in

[6,11]), including loci with known mechanisms, tissue specificity, and physiological impact

(e.g., those inHNF1A and CILP2/TM6SF2). The unsupervised hierarchical clustering model

applied in these previous efforts requires that a variant be included in only one pathway, so-

called “hard-clustering,” and was limited by the GWAS datasets available for diabetes-related

traits. We were interested in investigating a more flexible model that would allow a variant to

impact more than one biological pathway and hypothesized that this might improve cluster

interpretability, using the most up-to-date GWAS datasets available for metabolic traits.

In this paper, we test whether a germline genetic approach can be used to deconstruct T2D

heterogeneity. First, we ask whether genetic variants can be categorized into groups represent-

ing likely disease mechanistic pathways. We apply the novel clustering method Bayesian non-

negative matrix factorization (bNMF) to enable a "soft clustering," whereby a variant can be

associated to more than one cluster, which has been used previously in cancer genomics [12–

15]. To confirm that these groups of variants represent distinct entities with predicted biologi-

cal relevance, we assess tissue specificity for enhancer or promoter enrichment. Finally, we ask

whether the novel clusters of genetic loci we identified are of any clinical consequence, which

we assess in four independent cohorts of individuals with T2D.

Materials and methods

Variant and trait selection

The study design proceeded as per a prespecified analysis plan, modified to reflect the addi-

tional number of genetic variants associated with T2D over the course of the study as well as

the increased availability of GWAS datasets for metabolic traits. To obtain a comprehensive set

of genetic variants associated with T2D, we started with the set of 88 variants reaching

genome-wide significance aggregated by Mohlke and Boehnke [5], and then added 37 addi-

tional loci that were reported in subsequent T2D large-scale genetic studies [6,7,9]. At some

loci, there have been reports of multiple distinct signals [6,8]; we included 9 additional variants

representing distinct signals at 6 loci (ANKRD55,DGKB,CDKN2A,KCNQ1, CCND2, and

HNF4A). Of the 125 T2D variants considered, we selected a subset of 94 representative variants

based on the condition that either the variant or a proxy of the variant had an association with

T2D in the DIAGRAM version 3 (DIAGRAMv3) Stage 1 meta-analysis [16] with P< 0.05 (S1

Table). Proxies of variants were required to either be in linkage disequilibrium (r2> 0.6) with

an original T2D variant published at genome-wide significance, be the lead SNP at the T2D

locus in DIAGRAMv3, or reach genome-wide significance in DIAGRAMv3. Because DIA-

GRAMv3 contains study populations of mostly European ancestry, focusing on variants with

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 4 / 23

Page 5: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

at least nominal significance in this dataset helped ensure that these variants would also be

associated with other traits in published GWAS, as most of the GWAS populations were also

of predominantly European ancestry. Additionally, variants (other than those representing

distinct signals at a locus) were conservatively excluded if they fell within 500 kb of another

variant on the list. Given that C/G and A/T alleles are ambiguous and can lead to errors in

aligning alleles across GWAS, we avoided inclusion of ambiguous alleles, choosing proxies

instead.

For the 94 T2D-associated variants, the T2D risk–increasing alleles were identified and all

future analyses used the aligned T2D risk–increasing alleles. Summary association statistics for

additional traits whose GWAS meta-analyses were publicly available were aggregated for each

variant (S2 Table). These traits included glycemic traits available through the Meta-Analyses

of Glucose and Insulin-related traits Consortium (MAGIC) (fasting insulin, fasting glucose,

fasting insulin adjusted for BMI, 2-hour glucose on oral glucose tolerance test [OGTT]

adjusted for BMI [2hrGlu adj BMI], glycated hemoglobin [HbA1c], homeostatic model assess-

ments of beta cell function [HOMA-B] and insulin resistance [HOMA-IR], incremental insu-

lin response at 30 minutes on OGTT [Incr30], insulin secretion at 30 minutes on OGTT

[Ins30], fasting proinsulin adjusted for fasting insulin, corrected insulin response [CIR], dispo-

sition index [DI], and insulin sensitivity index [ISI]) [17–23]. Anthroprometric traits were

publicly available through the Genetic Investigation of ANthrometric Traits (GIANT) consor-

tium (BMI, height, waist circumference [WC] with and without adjustment for BMI, and

waist-hip ratio [WHR] with and without adjustment for BMI) [24–26]. Additional birth

weight and length GWAS summary statistics were obtained from the Early Growth Genetics

(EGG) consortium [27,28]. GWAS results for visceral and subcutaneous adipose tissue were

available from VATGen consortium [29], as well as results for percent body fat [30] and heart

rate [31]. Finally, serum laboratory values were available for the following traits: lipid levels

(HDL cholesterol, low-density lipoprotein [LDL] cholesterol, total cholesterol, triglycerides)

[32], leptin with and without BMI adjustment [33], adiponectin adjusted for BMI [34], urate

[35], Omega-3 fatty acids [36], Omega-6-fatty acids [37], plasma phospholipid fatty acids in

the de novo lipogenesis pathway [38], and very long-chain saturated fatty acids [39]. To reduce

noise in the cluster analysis, traits were only included if at least one variant was associated with

the trait at a Bonferroni-corrected threshold of significance P< 5 × 10−4 (0.05/94).

In addition to the above traits used in the clustering process, single-variant association

results with ten clinical outcomes were also aggregated (S2 Table): ischemic stroke and its sub-

types (large vessel disease, small vessel disease, and cardioembolic) [40]; coronary artery dis-

ease (CAD) [41]; renal function as defined by estimated glomerular filtration rate (eGFR);

urine albumin-creatinine ratio (UACR); chronic kidney disease (CKD) [42]; and systolic

blood pressure (SBP) and diastolic blood pressure (DBP) [43].

bNMF clustering

We generated standardized effect sizes for variant trait associations from GWAS by dividing

the estimated regression coefficient beta by the standard error, using the GWAS summary sta-

tistic results. To address the marked differences in sample size across studies and allow for a

more uniform comparison of phenotypes across studies, we scaled the standardized effect sizes

by the square root of the study size, as estimated by mean sample size across all SNPs, forming

the variant-trait association matrix Z (94 by 47).

To enable an inference for latent overlapping modules or clusters embedded in variant-trait

associations, we modified the existing bNMF algorithm to explicitly account for both positive

and negative associations. Each column of Z split into two separate entities: one contained all

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 5 / 23

Page 6: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

positive z-scores, the other all negative z-scores multiplied by −1, and setting zero otherwise,

leading to the association matrix X (94 by 94) comprised of doubled traits with positive and

negative associations to variants. Then, bNMF factorizes X into two matrices, W (94 by K) and

HT (94 by K), with an optimal rank K, as X ~ WH, corresponding to the association matrix of

variants and traits to the determined clusters, respectively. This mathematical framework

enables bNMF to tackle both positive and negative associations with no loss of information,

while keeping its nonnegativity constraint. Determining the proper model order K is a key

aspect in balancing data fidelity and complexity. Conventional nonnegative matrix factoriza-

tion (NMF) requires the model order as an input, or it may be determined post–data process-

ing, but bNMF is designed to suggest an optimal K best explaining X at the balance between an

error measure, ||X−WH||2, and a penalty for model complexity derived from a nonnegative

half-normal prior for W and H [12–14]. In addition, bNMF exploits an automatic relevance

determination technique to iteratively regress out irrelevant components in explaining the

observed data X. More specifically, each column in W and each row in H is strictly tied

together by a common relevance weight corresponding to the variance of half-normal priors,

which is also sampled from the inverse-gamma hyperpriors. The exact objective function opti-

mized by bNMF is a posterior, which has two opposing contributions from the likelihood

(Frobenius norm) and the regularization penalty (L2-norm of W and H coupled by the rele-

vance weights) [12]. During the search for the maximum posterior solution, bNMF iteratively

prunes irrelevant components in explaining X through a shrinkage of relevance weights and

enables an optimal inference for the number of clusters, K, as well as a sparse representation of

W and H at the balance between data fidelity (Frobenius norm or sum of squared error) and

complexity (regularization penalty). The defining features of each cluster are determined by

the most highly associated traits, which is a natural output of the bNMF approach. bNMF algo-

rithm was performed in R Studio for 1,000 iterations with different initial conditions, and the

maximum posterior solution at the most probable K was selected for downstream analysis.

The results of clustering provide cluster-specific weights for each variant (W) and trait (H).

Trait and outcome associations with each cluster

An association of the genetic risk score (GRS) for each cluster with each GWAS trait or out-

come (“GWAS GRS”) was performed using inverse-variance weighted fixed effects meta-anal-

ysis using summary statistics from GWAS, as has been done previously [10]. We tested for

associations between the GWAS GRSs and traits to confirm clustering results because traits

were included in the clustering process; outcomes were not included in the clustering process.

For these analyses, the top set of strongest-weighted variants for each cluster were included

in the model using a cutoff weighting of 0.75, which was determined by two independent

approaches involving modeling of cluster weights. First, the top-weighted trait in each cluster

was assessed with variants in the corresponding cluster in a stepwise approach; a meta-analysis

of the variants with the top-weighted trait was performed, starting with all variants and remov-

ing sequential variants from lowest to highest weighted until the local minimum meta-analysis

P-value was obtained. As a second approach, we assessed the distribution of the variant cluster-

ing weights for each variant across all clusters with the goal of identifying optimal cutoffs to

define the beginning of the “long tail” of cluster weights, representing less informative variants

for each cluster (S3 Fig). We plotted the change in consecutive clustering weight deltas sorted

in descending order and reasoned that the long tail should start just after the last significant

difference in consecutive weights (S4 Fig). Therefore, all clustering weight deltas were ordered

in descending order, and the top 5% were considered to be significant deltas. The last signifi-

cant delta is shown in S4 Fig with an arrow, and it corresponds to clustering weight of 0.75.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 6 / 23

Page 7: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

Ten outcomes were assessed, which were independent of the traits included in the bNMF

clustering. A conservative significance threshold was set at 1×10−3 using a Bonferroni correc-

tion for 10 traits and 5 clusters (0.05/50).

Functional annotation and enrichment analysis

We calculated Bayes factors for all variants 500 kb upstream and downstream with r2 greater

than 0.1, with the index variant (100% credible set) at each locus from effect size estimates and

standard errors, using the approach of Wakefield [44]. We then calculated a posterior proba-

bility for each variant by dividing the Bayes factor by the sum of all Bayes factors in the credible

set. We obtained previously published 13-state ChromHMM [45] chromatin state calls for 28

cell types, excluding cancer cell lines [46]. For each cell type, we extracted chromatin state

annotations for enhancer (Active Enhancer 1, Active Enhancer 2, Weak Enhancer, Genic

Enhancer) and promoter (Active Promoter) elements.

We assessed enrichment of annotations first within clusters and second across clusters. For

within-cluster analysis, we overlapped cell type annotations with credible set variants for loci

in each cluster. We then calculated a cell type probability for the cluster as the sum of posterior

probabilities of variants in cell type enhancers or promoters divided by the number of loci in

the cluster. For the across-cluster analysis, we overlapped cell type annotations with credible

set variants for all loci. For each cluster, we then calculated a cell type probability as the sum of

posterior probabilities of all variants in cell type enhancers or promoters in the cluster divided

by the total number of loci in the cluster.

We derived significance for cell type probabilities for each cluster using a permutation-

based test. Within each cluster, we permuted locus and cell type labels and then recalculated

cell type probabilities, as above. For across-cluster analysis, we permuted cluster labels for each

locus and then recalculated cell type probabilities for the permuted clusters, as above. For the

five loci (ADCY5,CDC123,HNF4A,HSD17B12, and CCND2) represented in multiple clusters,

we ensured that each locus was only represented once per cluster. We then used the cell type

probabilities derived from 1 million permutations as a background distribution and performed

a one-tailed test to ascertain significance for each cell type.

Study populations

Metabolic Syndrome in Men Study. From this cross-sectional study of Finnish men [47],

we analyzed data from 487 individuals with T2D previously ascertained for genotyping as part

of the T2D-GENES initiative [48]. Genotyping was performed using Illumina HumanExome-

12v1_A Beadchip, and imputation was performed using the Haplotype Consortium Reference

Panel [49] using the Michigan Imputation Server [50]. Written consent was provided by all

study participants.

Diabetes Genes in Founder Populations (Ashkenazi) study. We analyzed data from 509

individuals with T2D from this study previously ascertained for genotyping as part of the

T2D-GENES initiative [48]. Briefly, the study consists of individuals of Ashkenazi Jewish ori-

gin selected from two separate DNA collections: (1) one affected individual selected per family

from a genome-wide, affected-sibling-pair linkage study [51]. Families in which both parents

were known to have diabetes were excluded. (2) Patients ascertained by the Israel Diabetes

Research Group between 2002 and 2004 from 15 diabetes clinics throughout Israel. For this

study, only T2D patients with age of diagnosis between 35 and 60 were selected [52]. Genotyp-

ing was performed using Illumina Cardio-Metabo Chip, and imputation was performed using

the Haplotype Consortium Reference Panel [49] using the Michigan Imputation Server [50].

Written consent was provided by all study participants.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 7 / 23

Page 8: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

The Partners Biobank. The Partners HealthCare Biobank maintains blood and DNA

samples from more than 60,000 consented patients seen at Partners HealthCare hospitals,

including Massachusetts General Hospital, Brigham and Women’s Hospital, McLean Hospital,

and Spaulding Rehabilitation Hospital, all in the Boston area of Massachusetts [53]. Patients

are recruited in the context of clinical care appointments at more than 40 sites and clinics, and

also electronically through the patient portal at Partners HealthCare. Biobank subjects provide

consent for the use of their samples and data in broad-based research. The Partners Biobank

works closely with the Partners Research Patient Data Registry (RPDR), the Partners’ enter-

prise data repository designed to foster investigator access to a wide variety of phenotypic data

on more than 4 million Partners HealthCare patients. Written consent was provided by all

study participants. Approval for analysis of Biobank data was obtained by Partners IRB, study

2016P001018.

T2D status was defined based on “curated phenotypes” developed by the Biobank Portal

team using both structured and unstructured electronic medical record (EMR) data and clini-

cal, computational, and statistical methods [54]. Cases were selected by this algorithm to have

T2D with PPV of 90% and required to be of at least age 35 to further minimize misclassifica-

tion of T2D diagnosis. Additional phenotypic data were extracted using the most recent value

within the past 5 years. Genomic data for 15,061 individuals were generated with the Illumina

Multi-Ethnic Genotyping Array. The genotyping data were harmonized and quality controlled

with a three-step protocol, including two stages of SNP removal and an intermediate stage of

sample exclusion. The exclusion criteria for variants were (i) missing call rate�0.05, (ii) signif-

icant deviation from Hardy-Weinberg equilibrium (P� 10−6 for controls and P� 10−20 for

the entire cohort), (iii) significant differences in the proportion of missingness between cases

and controls P� 10−6, and (iv) minor allele frequency<0.001. The exclusion criteria for sam-

ples were (i) gender discordance between the reported and genetically predicted sex, (ii) sub-

ject relatedness (pairs with�0.125, from which we removed the individual with the highest

proportion of missingness), (iii) missing call rates per sample�0.02, and (iv) population struc-

ture showing more than four standard deviations within the distribution of the study popula-

tion, according to the first four principal components. Phasing was performed with

SHAPEIT2 [55] and then imputed with the Haplotype Consortium Reference Panel [49] using

the Michigan Imputation Server [50].

The UK Biobank. UK Biobank (UKBB) is a prospective cohort of about 500,000 recruited

individuals from the general population aged 40–69 years in 2006–2010 from across the

United Kingdom, with genotype, phenotype, and linked healthcare record data. Individuals in

UKBB underwent genotyping with one of two closely related custom arrays (UK BiLEVE

Axiom Array or UK Biobank Axiom Array) consisting of over 800,000 genetic markers scat-

tered across the genome. Additional genotypes were imputed centrally using the Haplotype

Reference Consortium (HRC) reference panel [49], 1000G phase 3 [56], and UK10K reference

panel [57], as previously reported [58]. SNPs used for computing the GRS were imputed only

with the HRC reference panel. We restricted the analysis to a subset of unrelated individuals of

white European ancestry, constructed centrally using a combination of self-reported ancestry

and genetically confirmed ancestry, by projecting UKBB genetic principal components on to

1000G phase 3 reference principal component space. We focused on individuals with T2D,

based on a recently developed algorithm of information at the baseline visit that took into

account diabetes diagnosis, diabetes medication use, and age at diagnosis [59]. We expanded

upon this definition to include touchscreen self-reported diagnosis and diagnosis and medica-

tion information provided at repeat visits, and removed individuals with reported “age of dia-

betes diagnosis” less than 40 to further reduce possible contamination with type 1 diabetes

diagnoses. A total of 14,813 individuals determined by the algorithm to have “probable” or

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 8 / 23

Page 9: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

“possible” T2D were included in our analyses. Documented consent was provided by all study

participants.

Individual-level analyses of individuals with T2D. Individual-level analyses were per-

formed using data from Metabolic Syndrome in Men Study (METSIM), the Diabetes Genes in

Founder Populations (Ashkenazi) study, the Partners Biobank, and UKBB. Analyses were

restricted to individuals with T2D and Caucasian ancestry.

All SNPs were genotyped or imputed with high quality (r2 values>0.95). SNPs were included

in genetic risks scores as allele dosages. GRSs were generated for each cluster by multiplying a var-

iant’s genotype dosage by its cluster weight. Only the top-weighted variants falling above the

threshold, as defined above, were included in the GRS for each cluster. Logistic regression and lin-

ear regression were performed in Stata v14.2, adjusting for sex, age, and principal components.

Results from the multiple cohorts were meta-analyzed using an inverse-variance weighted fixed

effect model. Comparison of traits across the five subgroups of individuals with top decile cluster

GRSs were compared using the Kruskal-Wallis non-parametric test for continuous traits, except

for percentage female sex, which was compared with the chi-squared test.

Results

Clustering suggests five dominant pathways driving diabetes risk

Clustering of variant-trait associations was performed for 94 genetic variants and 47 traits

derived from GWAS using bNMF clustering, with identification of five robust clusters present

on 82.3% of iterations (S1 and S2 Figs; S1–S4 Tables). In 17.3% of the iterations, the data con-

verged to four clusters, in which Cluster 2 was subsumed into Cluster 1. As bNMF clusters

both variants and traits, the top-weighted traits can be used to help define the underlying

mechanism in each cluster. The five clusters appeared to represent two mechanisms of beta

cell dysfunction and three mechanisms of insulin resistance (Fig 1A, Table 1, S5 Table).

The most strongly weighted traits for Clusters 1 and 2 relate to insulin production and pro-

cessing in the pancreatic beta cell. In Cluster 1 (Beta Cell cluster), these traits included

decreased CIR, DI, insulin at 30 minutes of OGTT adjusted for BMI (Ins30 adj BMI),

HOMA-B, and increased proinsulin levels adjusted for fasting insulin. Similarly, in Cluster 2

(Proinsulin cluster), the top-weighted traits included decreased Ins30 and HOMA-B but also

decreased proinsulin levels adjusted for fasting insulin, suggesting another mechanism impact-

ing beta cell function. The loci that clustered most strongly into Cluster 1 include many well-

known beta cell–related loci:MTNR1B,HHEX, TCF7L2, SLC30A8,HNF1A, andHNF1B(Table 1, S3 Table) [60,61]. Similarly, the two strongest weighted loci in Cluster 2 are ARAP1,

a locus at which diabetes risk is thought to be mediated by modulation of STARD10 expression

in pancreatic beta cells [62], and SPRY2, a locus at which the closest gene, SPRY2, regulates

insulin transcription [63] (S3 Table). Using GWAS summary statistics, GRSs composed of

risk alleles (“GWAS GRS”) from top-weighted loci in each cluster (N loci Beta Cell = 30, N loci

Proinsulin = 7; see Materials and methods) were associated, as expected, with decreased

HOMA-B (P< 10−10) in both clusters and fasting insulin (P< 5×10−4) in both clusters. The

Beta Cell cluster GWAS GRS was associated with increased proinsulin (P< 10−9), while the

Proinsulin cluster GWAS GRS was associated with decreased proinsulin (P< 10−17) (Table 1).

In contrast, Clusters 3, 4, and 5 all appeared to relate to mechanisms of insulin response.

The traits that clustered most strongly with Cluster 3 (Obesity cluster) include increased WC,

hip circumference (HC), BMI, and percent body fat. The top loci in this cluster include the

well-known obesity-associated loci FTO andMC4R (Table 1, S3 Table). The GWAS GRS for

top-weighted loci in this cluster (N loci = 5) was significantly associated with these same traits

(P< 10−24) as well as increased fasting insulin unadjusted for BMI (P< 10−7), but not fasting

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 9 / 23

Page 10: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

insulin adjusted for BMI (P = 0.57). Thus, based on these association patterns, this cluster

appeared to represent an obesity-mediated form of insulin resistance.

Cluster 4 (Lipodystrophy cluster) appears to represent the same “lipodystrophy-like” insu-

lin resistance cluster previously suggested by Yaghootkar and colleagues [10,64], with all the

variants from that previous set that were also associated with T2D being among those loci

most strongly weighted in this cluster (PPARG, ANKRD55,ARL15,GRB14, IRS1, and

LYPLAL1) (Table 1, S3 Table). Cluster 4’s strongest weighted traits were decreased ISI

adjusted for BMI, adiponectin, HDL, and increased triglycerides; GWAS GRSs of alleles from

the strongest weighted loci in this cluster (N loci = 20) were associated with all of these traits

(P< 10−20). This cluster appears to represent a lipodystrophy or fat distribution–mediated

form of insulin resistance. Interestingly, while an increased GWAS GRS from this cluster was

significantly associated with increased WHR in women (P< 10−31), it was associated with

decreased WHR in men (P< 10−5). These ratios appear to be driven by GWAS GRS associa-

tions with decreased HC in both sexes (P< 10−9) but a significant association with decreased

WC only observed in men (P< 10−14).

Cluster 5 (Liver/Lipid cluster) was notable for having decreased serum triglyceride levels,

palmitoleic acid, urate, and linolenic acid, along with increased gamma-linolenic acid, as the

traits most strongly weighted in this cluster. The GWAS GRS for the highest weighted loci (N

Fig 1. Cluster-defining characteristics. (A) Standardized effect sizes of cluster GRS-trait associations derived from GWAS

summary statistics shown in spider plot. The middle of the three concentric octagons is labeled “0,” representing no

association between the cluster GRS and trait. A subset of discriminatory traits are displayed. Points falling outside the

middle octagon represent positive cluster-trait associations, whereas those inside it represent negative cluster-trait

associations. (B) Associations of GRSs in individuals with T2D with various traits. Results are from four studies (METSIM,

Ashkenazi, Partners Biobank, and UK Biobank) meta-analyzed together. Effect sizes are scaled by the raw trait standard

deviation. (C) Differences in trait effect sizes between individuals with T2D having GRSs in the highest decile of a given

cluster versus all other individuals with T2D. Results are from the same four studies meta-analyzed together. Effect sizes are

scaled by the raw trait standard deviation. BMI, body mass index; Fastins, fasting insulin; GRS, genetic risk score; GWAS,

genome-wide association study; HDL, high-density lipoprotein; HOMA-B, homeostatic model assessment of beta cell

function; HOMA-IR, homeostatic model assessment of insulin resistance; METSIM, Metabolic Syndrome in Men Study;

ProIns, fasting proinsulin adjusted for fasting insulin; TG, serum triglycerides; T2D, type 2 diabetes; WC, waist

circumference; WHR-F, waist-hip ratio in females; WHR-M, waist-hip ratio in males.

https://doi.org/10.1371/journal.pmed.1002654.g001

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 10 / 23

Page 11: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

Table 1. Associations of cluster genetic risk scores and selected GWAS traits.

Beta Cell

N Loci = 30

Proinsulin

N Loci = 7

Obesity

N Loci = 5

Lipodystrophy

N Loci = 20

Liver/Lipid

N Loci = 5

Trait beta P-value beta P-value beta P-value beta P-value beta P-value

Adiponectin −0.0005 0.55 −0.0019 0.37 −0.0007 0.74 −0.0114 3.34E−23 −0.0007 0.77

BMI −0.0026 6.0×10−5 −0.0080 3.1×10−8 0.0396 9.7×10−157 −0.0079 1.81E−21 0.0001 0.94

Bodyfat −0.0016 0.11 −0.0061 4.5×10−3 0.0247 2.1×10−25 −0.0120 9.04E−22 −0.0031 0.26

CIR −0.0584 7.1×10−43 −0.0234 0.014 −0.0010 0.92 0.0087 0.10 −0.0021 0.85

DI −0.0543 6.6×10−37 −0.0080 0.40 −0.0086 0.40 −0.0102 0.05 −0.0115 0.30

2hrGlu adj BMI 0.0288 2.0×10−13 0.0204 0.02 0.0064 0.49 0.0292 2.26E−9 −0.0257 0.01

FI −0.0033 4.4×10−7 −0.0054 2.2×10−4 0.0087 6.1×10−8 0.0068 1.96E−16 0.0071 3.8×10−5

FI adj BMI −0.0026 4.4×10−6 −0.0040 1.3 ×10−3 −0.0008 0.57 0.0082 3.01E−31 0.0082 2.1×10−8

HDL −0.0008 0.51 −0.0031 0.27 −0.0059 0.05 −0.0191 1.96E−33 0.0069 0.038

Height 0.0009 0.12 −0.0058 3.3×10−5 −0.0033 1.9×10−5 0.0061 4.44E−05 −0.0005 0.77

HC −0.0031 3.5×10−5 −0.0113 1.0×10−11 0.0345 9.1×10−79 −0.0116 9.69E−34 −0.0007 0.73

HOMA-B −0.0066 1.9×10−21 −0.0103 2.6×10−11 0.0066 8.0×10−5 0.0019 0.03 0.0019 0.30

HOMA-IR −0.0011 0.21 −0.0041 0.03 0.0108 9.0×10−8 0.0066 2.2×10−10 0.0093 2.6×10−5

Incr30 −0.0398 6.9×10−19 −0.0239 0.02 −0.0053 0.63 0.0198 4.8×10−4 0.0102 0.38

Ins30 adj BMI −0.0503 1.8×10−28 −0.0310 1.8×10−3 0.0027 0.81 0.0163 3.9×10−3 0.0054 0.64

ISI adj BMI −0.0039 0.06 −0.0020 0.67 0.0045 0.37 −0.0213 1.3×10−13 −0.0086 0.12

Leptin 0.0009 0.50 −0.0067 0.03 0.0197 1.0×10−9 −0.0245 8.9×10−21 0.0147 3.1×10−5

Linoleic acid 0.0093 0.29 −0.0232 0.25 0.0027 0.90 −0.0024 0.83 0.1330 1.31x10−8

Palmitoleic 0.0002 0.74 0.0024 0.11 0.0034 0.03 −0.0020 0.02 −0.0104 5.50x10−9

Proinsulin 0.0097 1.2×10−10 −0.0297 1.4×10−18 0.0047 0.18 0.0059 1.3×10−3 0.0059 0.13

Total Chol 0.0023 0.06 −0.0055 0.04 −0.0023 0.45 0.0046 3.2×10−3 −0.0182 3.1×10−8

Triglycerides 0.0022 0.07 −0.0027 0.33 0.0066 0.03 0.0194 1.8×10−34 −0.0416 1.0×10−35

Urate −0.0007 0.51 −0.0045 0.084 0.0165 1.4×10−9 0.0090 2.2×10−10 −0.0260 2.6×10−18

WC −0.0020 5.23×10−3 −0.0096 1.5×10−9 0.0379 1.0×10−102 −0.0058 3.6×10−10 −0.0005 0.80

WC female −0.0010 0.30 −0.0073 3.9×10−4 0.0376 1.4×10−60 −0.0022 0.07 0.0000 0.99

WC male −0.0031 1.6×10−3 −0.0128 5.4×10−9 0.0374 1.1×10−51 −0.0102 2.8×10−15 0.0007 0.80

WHR 0.0014 0.05 −0.0016 0.30 0.0229 3.7×10−39 0.0051 1.6×10−8 0.0016 0.43

WHR female 0.0027 4.0×10−3 0.0010 0.62 0.0221 2.8×10−22 0.0140 5.6×10−32 0.0027 0.31

WHR male 0.0003 0.74 −0.0049 0.03 0.0242 1.4×10−21 −0.0059 7.6×10−6 0.0003 0.92

Loci included in clusters

Beta Cell:MTNR1B,CDKAL1, C2CD4A,HHEX, TCF7L2, SLC30A8,CDKN2A_B, CDC123.CAMK1D,HNF1A,AP3S2, ZHX3,UBE2E2,ACSL1, PRC1, GIPR HNF1B,

KCNJ11, KCNQ1_2, ABO,ANK1,GLIS3,GLP2R, CTRB2,CDKN2A_2 DUSP8, ADCY5,GIP,HNF4A,HSD17B12TLE4.

Proinsulin: ARAP1, SPRY2, DGKB_2, IGF2BP2, CCND2,HNF4A, CDC123.CAMK1D.

Obesity: FTO,MC4R,NRXN3,HSD17B12,RBMS1.

Lipodystrophy: IRS1,GRB14, PPARG, LYPLAL1, ANKRD55,CMIP, KLF14, LPL, ANKRD55_2, ARL15,ADCY5,C17orf58,POU5F1,MACF1, ZBED3,KIF9, ADAMTS9,CCND2, FAF1, MPHOSPH9.

Liver: GCKR, CILP2,HLA.DQA1,PNPLA3, TSPAN8.LGR5.

P-values< 2 × 10−4 and corresponding betas are bolded, representing a Bonferonni correction of 47 traits × 5 clusters.

Abbreviations: BMI, body mass index; Chol, serum total cholesterol; CIR, corrected insulin response; DI, disposition index; FI, fasting insulin; FI adj BMI, FI adjusted

for BMI; GWAS, genome-wide association study; HC, hip circumference; HDL, high-density lipoprotein; HOMA-B, homeostatic model assessment of beta cell

function; HOMA-IR, homeostatic model assessment of insulin resistance; Incr30, incremental insulin response at 30 minutes on OGTT; Ins30 adj BMI, insulin response

at 30 minutes on OGTT adjusted for BMI; ISI adj BMI, insulin sensitivity index adjusted for BMI; OGTT, oral glucose tolerance test; WC, waist circumference; WHR,

waist-hip ratio; 2hrGlu adj BMI, 2-hour glucose on OGTT adjusted for BMI.

https://doi.org/10.1371/journal.pmed.1002654.t001

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 11 / 23

Page 12: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

loci = 5) in this cluster was significantly associated with all the above traits (P< 10−7). Notably,

three of the top four weighted loci, GCKR, CILP2/TM6SF2, and PNPLA3 (Table 1, S3 Table)

have been previously associated with nonalcoholic fatty liver disease (NAFLD) [65], and func-

tional work has implicated these loci in liver lipid metabolism [66–70].

Clusters are distinctly enriched for tissue enhancers or promoters

To gain further support for the suspected mechanistic pathways represented by these clusters

and assess the biological distinctness of the clusters through an independent analysis, we

assessed the top loci in each cluster for enrichment of epigenomic annotations (Fig 2). We

expected each pathway to capture a different disease mechanism and thus localize largely to

specific and distinct tissues; as common variants have been shown to lie typically in noncoding

regions and presumably alter regulatory elements (enhancers, promoters), we assessed

whether variants in the credible sets for loci in each cluster preferentially altered enhancers or

promoters in specific cell types.

Within the Liver/Lipid cluster, there was significant enhancer/promoter enrichment in

liver tissue (P< 0.001), and within the Lipodystrophy cluster, there was significant enrichment

in adipose tissue (P< 0.001), each compared to the 27 other tissues assessed (Fig 2). The adi-

pose tissue enrichment in the Lipodystrophy cluster was also the most significant across the

five clusters (P< 0.05) (S5 Fig). The loci in the Obesity cluster were most strongly enriched

for enhancers/promoters in pre-adipose tissue, both compared to the other tissues and clusters

(P< 0.05). The two clusters suspected to be involved in beta cell function (Beta Cell cluster

and Proinsulin cluster) were most strongly enriched for pancreatic islet cell enhancers and

promoters, with significant within (P< 0.001) and across (P< 0.05) enrichment for the Beta

Cell cluster. Interestingly, the Proinsulin cluster had distinct enrichment in embryonic stem

cells and brain tissue compared to the other clusters (P< 0.05), which may reflect gene regula-

tion related to early development, as beta cells and neurons have been thought to share gene

expression patterns related to development [71]. Thus, in each case, the tissue enrichment sup-

ported the predicted biological mechanisms of each cluster.

Clusters are differentially associated with clinical outcomes from GWAS

To establish translational relevance, we next asked whether GWAS GRSs from the strongest

weighted loci in each cluster were associated with any clinical outcomes. We focused on clini-

cal outcomes related to T2D available through GWAS: CAD, renal function as assessed by

eGFR and UACR, stroke risks, and blood pressure measures (Table 2). GWAS GRSs from the

Beta Cell and Lipodystrophy clusters were most strongly associated with increased risk of

CAD (P< 10−7). Increased Beta Cell cluster GWAS GRS was also significantly associated with

increased risk for ischemic stroke (P = 10−4) as well as stroke subtypes, including large artery

(P = 10−5) and small vessel disease–related strokes (P = 10−4), but not cardioembolic stroke

(P = 0.6). There were also nominally significant trends for these same stroke subtypes with the

Lipodystrophy cluster. Only increased Lipodystrophy cluster GWAS GRS was significantly

associated with increased blood pressure (SBP, P = 6 × 10−6; DBP, P = 5 × 10−9). Renal function

was most significantly associated with the Liver/Lipid cluster GWAS GRS; interestingly, there

was a significant association for increased Liver/Lipid GWAS GRS with reduced eGFR

(P = 10−6) but, surprisingly, also reduced UACR (P = 10−3), whereas typically patients with dia-

betic kidney disease have reduced eGFR with elevated UACR. In contrast, increased Lipody-

strophy cluster GWAS GRS was significantly associated with increased UACR (P = 9 × 10−5).

No cluster GWAS GRS was associated with risk of CKD, defined as eGFR< 60 mL/minute/

1.73 m2.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 12 / 23

Page 13: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

Fig 2. Enrichment for tissue-specific enhancers in clusters. Heatmap of associations of enrichment for enhancers and promoters from

tissues residing within the top-weighted loci from each cluster. ��P< 0.001, �P< 0.05. Cell line epigenetic data were obtained through the

Epigenomics Roadmap; the authors played no role in the procurement of tissue or generation of the data for the three embryonic stem cell

lines. CD34, Cluster of differentiation 34; ES-HUES6, HUES6 human embryonic stem cell line; ES-HUES64, HUES64 human embryonic

stem cell line; hASC, human Adipose-derived Stromal Cells; HMEC, Human Mammary Epithelial Cell line; HSMM, Human Skeletal

Muscle Myoblast cell line; H1, H1 human embryonic stem cell line; NHEK, Normal Human Epidermal Keratinocyte cell line; NHLF,

Normal Human Lung Fibroblast cell line.

https://doi.org/10.1371/journal.pmed.1002654.g002

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 13 / 23

Page 14: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

Application of clusters to patients with T2D

To determine whether cluster GRS associations observed in large GWAS would have relevance

to patients with T2D, we investigated cohorts of individuals with T2D ascertained from two

epidemiological studies (METSIM [N = 487] and the Diabetes Genes in Founder Populations

[Ashkenazi] Study [N = 509]) and two from biobanks (the Partners Biobank [N = 2,065] and

the UKBB [N = 14,813]).

We first validated associations between cluster GRSs and predicted phenotypes. In the up

to 17,365 combined individuals with T2D from the four cohorts, increased individual-level

GRSs were associated with the expected salient traits (Fig 1B, S6 Table): decreased BMI and

percent body fat (P< 10−20) as well as fasting C-peptide (P = 5 × 10−5) in the Beta Cell cluster;

similarly decreased BMI (P = 3 × 10−8) and fasting C-peptide (P = 0.04) in the Proinsulin clus-

ter; increased BMI, percent body fat, HC, and WC (P< 10−5) in the Obesity cluster; decreased

BMI, percent body fat (P< 10−15), and HDL (P = 5 × 10−5) in the Lipodystrophy cluster; and

decreased triglycerides in the Liver/Lipid cluster (P = 2 × 10−5). Interestingly, as was noted

with the GWAS GRS, a higher individual-level Lipodystrophy GRS had sex-differential associ-

ations with anthropometric traits: increased WHR in women (P = 0.007) but decreased in men

(P = 0.005). A similar discrepant pattern was also seen with BMI adjustment (S6 Table).

We next asked whether this genetic approach could be used to identify individuals with

T2D who had cluster-specific characteristics, in an initial attempt to stratify the population. In

other words, would individuals with the largest GRSs for each cluster differ from each other

and all other individuals with T2D, with regard to any clinical traits?

Consistently across the studies (N T2D = 17,365), we observed about 30% of individuals

with a GRS at the top 10th percentile of just one cluster (S8 Table), which is what is expected

by chance (under a binomial distribution). These individuals with the highest GRSs differed

significantly from all other individuals with T2D (Fig 1C, S7, and S9 Tables); for instance,

compared to all individuals across the three studies with T2D, those with extreme GRS in the

Beta Cell cluster (N = 1,068) had decreased BMI, HC, and WC (P< 10−3) and percent body fat

(P< 0.05), with a trend toward decreased fasting C-peptide (P = 0.19), and those in the Proin-

sulin cluster (N = 1,117) had significantly decreased fasting C-peptide levels (P = 0.003). Those

Table 2. Associations of cluster genetic risk scores and clinical outcomes from GWAS.

Beta Cell

N Loci = 30

Proinsulin

N Loci = 7

Obesity

N Loci = 5

Lipodystrophy

N Loci = 20

Liver/Lipid

N Loci = 5

Loci Combined

N Loci = 62

Outcome beta P-value beta P-value beta P-value beta P-value beta P-value Beta P-value

CAD 0.021 2.08E−12 −0.003 0.67 0.016 0.04 0.021 2.5×10−8 −0.009 0.27 0.017 1.2×10−15

CKD 0.003 0.35 0.009 0.22 0.015 0.04 0.0002 0.97 0.011 0.18 0.004 0.06

eGFR 0.000 0.87 0.0002 0.70 −0.0008 0.06 −0.00003 0.89 −0.002 1.2×10−6 0.000 0.07

UACR 0.001 0.42 0.003 0.27 0.003 0.32 0.006 9.0×10−5 −0.010 3.7×10−3 0.002 0.01

Stroke_IS 0.016 2.0×10−4 0.009 0.37 0.022 0.03 0.014 9.1×10−3 −0.002 0.84 0.013 1.3×10−5

Stroke_CE 0.004 0.59 −0.001 0.94 0.048 0.01 0.005 0.62 0.002 0.93 0.007 0.20

Stroke_LVD 0.032 5.6×10−5 −0.006 0.73 0.020 0.31 0.026 0.01 −0.017 0.46 0.023 5.1×10−5

Stroke_SVD 0.032 2.6×10−4 0.029 0.17 0.027 0.22 0.036 1.5×10−3 −0.007 0.79 0.028 7.3×10−6

SBP 0.035 0.09 −0.014 0.76 −0.088 0.07 0.149 4.9×10−9 −0.041 0.45 0.048 9.3×10−4

DBP 0.014 0.30 −0.027 0.36 −0.046 0.14 0.073 6.4×10−6 0.0005 0.99 0.023 0.01

P-values< 8 × 10−4 and corresponding betas are bolded, representing a Bonferonni correction of 10 outcomes × 6 groups.

Abbreviations: CAD, coronary artery disease; CE, cerebroembolic; CKD, chronic kidney disease; DBP, diastolic blood pressure; eGFR, estimated glomerular filtration

rate; IS, ischemic stroke all subtypes; LVD, large vessel disease; SBP, systolic blood pressure; SVD, small vessel disease; UACR, urine albumin-creatinine ratio.

https://doi.org/10.1371/journal.pmed.1002654.t002

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 14 / 23

Page 15: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

in the Obesity cluster (N = 1,206) had increased BMI, percent body fat, HC, and WC

(P< 0.05); those with extreme GRS in the Lipodystrophy cluster (N = 1,134) had significantly

decreased HDL, percent body fat, and BMI (P< 0.01), and those with extreme GRS in the

Liver/Lipid cluster (N = 924) had significantly decreased triglycerides (P = 0.01). Thus, individ-

uals with T2D and a GRS uniquely at the top 10th of one cluster as a group had representative

trait characteristics distinguishing them from all other individuals with T2D (Fig 1C, S7, and

S9 Tables).

Discussion

T2D, typically defined as hyperglycemia that is not autoimmune or monogenic in origin, is

commonly recognized as a heterogeneous conglomerate of various pathogenic mechanisms

and therefore is unlikely to represent a single disease process. However, understanding of the

biological pathways causing T2D to inform clinical management remains incomplete. Further-

more, despite over 100 T2D loci now identified, the relationship of these loci to disease path-

ways remains largely opaque.

The work described here is the most comprehensive assessment of T2D loci clustering to

date, including variant-trait associations for 94 T2D genetic variants and 47 diabetes-related

metabolic traits in publicly available GWAS datasets. We identify five robust clusters of T2D

variants, which appear to represent biologically meaningful distinct pathways. The first two

clusters (Beta Cell and Proinsulin) relate to pancreatic beta cell function and differ most nota-

bly in the direction of association with proinsulin; both contain loci for which functional work

has implicated beta cell function in the causal mechanisms (e.g., [60,61]). Additionally, the loci

in both these clusters were highly enriched for positional overlap with pancreatic islet tissue

enhancers and promoters. The three other clusters (Obesity, Lipodystrophy, Liver/Lipid)

appear to represent different pathways causing insulin resistance: obesity mediated, lipodystro-

phy (fat-distribution) mediated, and liver-lipid-metabolism mediated. The Liver/Lipid cluster

contains three of the top loci associated with NAFLD [65], and functional work has implicated

these loci in liver lipid metabolism resulting in sequestration of lipid in the liver, resulting in

decreased observed serum triglyceride levels [66–70]. Additionally, these three clusters related

to insulin action (Obesity, Lipodystrophy, Liver/Lipid) are enriched for variants overlaying

enhancers in tissues that biologically support their proposed mechanisms: pre-adipocytes, adi-

pocytes, and liver tissue, respectively. GRSs of top-weighted loci from the five clusters were

also associated with particular clinical outcomes, including increased SBP, risk of CAD, and

stroke, assessed using GWAS summary statistics.

Previous clustering efforts of T2D loci included less than half as many diabetes-related traits

[6,9,11] and focused predominantly on unsupervised hierarchical clustering, a method that

involves “hard clustering,” whereby a locus can be a member of only one cluster. Our analysis

uses a novel clustering method, bNMF, which enables "soft clustering," whereby a variant can

be a member of more than one cluster and also a data-driven method for determining the

number of clusters. With this method, the derived clusters are biologically interpretable and

include mechanistic processes not previously captured by previous hard-clustering efforts,

such as the Liver/Lipid cluster.

Our study additionally asked whether the clusters of variants have relevance to individuals

with T2D. In four cohorts with up to 17,365 individuals with T2D of European ancestry, we

showed that individual-level GRSs for each cluster were associated significantly with predicted

traits. Additionally, individuals with a very high GRS uniquely in the top 10th percentile of

one cluster had clinical features significantly distinguishing themselves from all other individ-

uals with T2D; we observed consistently that this group comprised about 30% of persons with

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 15 / 23

Page 16: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

T2D, consistent with chance expectation and, importantly, representing a sizable proportion

of individuals with T2D.

Thus, these results suggest that genetics can be used to stratify a reasonable proportion of

individuals with T2D who may belong to clinically meaningful subgroups. Such individuals

could be classified based on their genetics and targeted for precision surveillance and therapeu-

tics, should future studies find that these individuals differentially respond to medical interven-

tions or confirm risk of particular clinical outcomes. Of course, the threshold we chose of the

top 10th percentile for each cluster GRS is arbitrary, and further work is needed to determine

clinically relevant thresholds or combinations of GRS from multiple clusters. Using the 10th

percentile cutoff, study individuals with extreme GRSs on aggregate had clinical characteristics

distinguishable from others with T2D; however, at the individual level, such clinical features

might not be recognizable to a clinician because of the subtleties of the phenotypic characteris-

tics, small differences in effects sizes, and/or the potential for environmental influences. Fur-

thermore, as germline genetic variation is constant throughout the lifetime and essentially

unaltered by medications, it may provide a more robust metric than other biomarkers (which

are contingent on environmental changes), on which to anchor an initial stratification step.

Other efforts have tried to identify subtypes of T2D [3,4]. In the most recent assessment of

patients with new adult-onset diabetes in Scandinavia, Ahlqvist and colleagues used pheno-

typic information to define five subgroups of diabetes: an autoimmune form (capturing type 1

diabetes and latent autoimmune diabetes in the adult), two severe forms (severe insulin-defi-

cient [SIDD] and severe insulin-resistant diabetes [SIRD]), and two mild forms (obesity- and

age-related diabetes) [4]. Importantly, in contrast to our clusters of genetic loci, these clusters

are defined using clinical data and biomarkers at the time of diabetes diagnosis, and thus anal-

ysis of the same variables at a different time in the disease course or after treatment could yield

inappropriate results. The T2D subtypes described by Ahlqvist and colleagues seemed to have

different genetic architectures [4], so we were interested to determine whether our clusters of

genetic loci corresponded. Of the ten variants Ahlqvist and colleagues found to be associated

at nominal significance with their SIDD cluster that were also included in our analysis, seven

of these variants (or a proxy) had their strongest weights in our Proinsulin or Beta Cell clusters.

Our Obesity cluster may correspond to the mild obesity-related diabetes of Ahlqvist and col-

leagues; however, none of the top-weighted variants in this cluster were included in their anal-

ysis. To the extent that severity of disease might be correlated with duration of exposure (with

genetic exposure present at conception), our insulin resistance–related clusters might corre-

spond to the SIRD of Ahlqvist and colleagues; there were four variants found by Ahlqvist and

colleagues to be associated with the SIRD cluster at nominal significance, all of which were

included in our analysis and had their strongest weights in clusters we believe to relate to insu-

lin resistance (Liver/Lipid and Lipodystrophy clusters). Interestingly, variants from several of

our clusters were associated with the age-related diabetes cluster from Ahlqvist and colleagues.

Beyond clarifying disease causal mechanisms and offering the potential for patient stratifi-

cation, identification of the biologically motivated clusters of T2D loci in our study may help

also implicate loci with unknown function into pathways. For example, it is interesting that

theHLA.DQA1 locus is most highly weighted in the Liver/Lipid cluster; one might expect it to

have a predominant autoimmune mechanism of action, given its chromosomal location in the

human leukocyte antigen (HLA) region. The function of this locus remains to be discovered;

however, our results suggest that it is more likely to influence insulin resistance than insulin

deficiency. Membership to clusters may thus facilitate characterization of loci, which are gen-

erally challenging to study functionally.

Five variants were top weighted (above the threshold of 0.75) in more than one cluster:

CDC123/CAMKID (Beta Cell and Proinsulin), HSD17B12 (Beta Cell and Obesity), ADCY5

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 16 / 23

Page 17: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

(Beta Cell and Lipodystrophy), CCND2 (Proinsulin and Lipodystrophy), andHNF4A (Beta

Cell and Proinsulin). A strong weighting in more than one cluster suggests that these loci are

involved in more than one mechanistic process. Published functional data support such pleiot-

ropy; for example, the gene product of CCND2 is thought to be involved in pancreatic beta cell

growth [72] (in line with the Proinsulin cluster) and also in early stage adipocyte differentia-

tion [73] (consistent with the Lipodystrophy cluster).

Several loci included in this analysis have evidence of multiple independent signals [6,8].

We included 15 variants from 6 such loci (ANKRD55—two variants, DGKB—two variants,

CDKN2A—three variants, KCNQ1—four variants, CCND2—two variants, and HNF4A—two

variants), offering an opportunity to see whether distinct signals from the same locus would

cluster together (S1 Table). The multiple variants in ANKRD55,CDKN2A,KCNQ1, and

CCND2mapped most strongly to the same cluster for each locus. At the DGKB locus, the sig-

nal represented by rs10276674 was most strongly weighted in the Proinsulin cluster, whereas

the signal represented by rs2191349 was strongly weighted in the Beta Cell and Liver/Lipids

clusters. InHNF4A, rs4812829 was most strongly weighted in the Beta Cell and Proinsulin

clusters, whereas rs1800961 was most strongly weighted in the Lipodystrophy, Proinsulin, and

Liver/Lipid clusters (S3 Table). Interestingly, therefore, forHNF4A, the two signals separate

into predominant insulin deficiency and insulin resistance–related mechanisms. Potentially,

therefore, cross-phenotype analysis can provide additional support for the existence of inde-

pendent signals at these loci, perhaps indicating tissue-specific regulatory mechanisms.

The strengths of this study include the novel application of a Bayesian form of NMF cluster-

ing to complex disease genetics. The clusters resulting from bNMF are more readily interpret-

able than hierarchical clustering, given that bNMF also clusters traits. Furthermore, allowance

of an element to be part of more than one cluster (soft clustering) fits with our biological

understanding of complex disease-causing genetic variants, whereby a particular variant may

impact more than one biological pathway.

Limitations of this study include clustering of only available phenotypes from GWAS. It is

possible that future inclusion of additional traits would impact the clustering results, poten-

tially even creating a new cluster for a mechanistic pathway not currently captured with avail-

able phenotypes. Our analysis considered all known published T2D loci reaching genome-

wide significance to date; we expect that with forthcoming larger GWAS, additional loci will

be identified and could be included in future analyses [74]. Additionally, we have focused on

variants associated with T2D and related traits in populations of European ancestry. With

additional studies from populations of diverse ethnic backgrounds, it would be ideal to include

additional T2D-associated variants that were not included in the current analysis. Further-

more, the impact of the clusters on outcomes such as stroke and CAD was assessed through

GWAS GRS using GWAS summary statistics, which came from studies including individuals

with and without diabetes. It will be important to assess the association of cluster GRS and

these outcomes in future work using individual-level data; such analyses would ideally involve

cohorts with large sample sizes and well-phenotyped outcomes. While NMF provides a very

attractive method for variant-trait clustering, it is currently uncertain whether all weights or a

thresholded approach is ideal for assignment of elements in a cluster. For our analysis, we

developed a framework to determine a reasonable threshold; however, this question could ben-

efit from additional research.

In summary, clustering of genetic variants associated with T2D has identified five robust

clusters with distinct trait associations, which likely represent mechanistic pathways causing

T2D. These clusters have distinct tissue specificity, and patients enriched for alleles in each

cluster exhibit distinct predicted phenotypic features. We observe a substantial fraction (about

30%) of individuals with T2D who have T2D genetic risk factors highly loaded (in top 10th

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 17 / 23

Page 18: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

percentile) from just one of the five clusters. It will be exciting to explore whether such individ-

uals respond differentially to medications based on the pathway predominantly disrupted or

whether they have a differential rate of disease progression and diabetic complications. Classi-

fication of patients using data from designated genetic pathways may offer a step toward genet-

ically informed patient management of T2D in order to individualize and improve care.

Supporting information

S1 Fig. Loci associations to clusters.

(TIF)

S2 Fig. Trait associations to clusters.

(TIF)

S3 Fig. Distribution of cluster weights for variants.

(TIF)

S4 Fig. Deltas of cluster weights.

(TIF)

S5 Fig. Enrichment for tissue-specific enhancers and promoters compared across clusters.

(TIF)

S1 Table. List of SNPs.

(XLSX)

S2 Table. GWAS datasets. GWAS, genome-wide association study.

(XLSX)

S3 Table. Clustering results for T2D loci. T2D, type 2 diabetes.

(XLSX)

S4 Table. Clustering results for traits.

(XLSX)

S5 Table. GWAS GRS associations with traits. GRS, genetic risk score; GWAS, genome-wide

association study.

(XLSX)

S6 Table. Meta-analysis of GRS associations with traits in four studies of patients with

T2D (adjusted for age and sex) plus sex-specific analyses. GRS, genetic risk score; T2D, type

2 diabetes.

(XLSX)

S7 Table. Meta-analysis of comparison of mean trait in individuals with top 10% decile of

one cluster GRS (cluster extreme) versus all subjects with T2D in four studies. GRS, genetic

risk score; T2D, type 2 diabetes.

(XLSX)

S8 Table. Counts of individuals from five studies in uniquely one top 10% decile of cluster

GRSs ("cluster extreme"). GRS, genetic risk score.

(XLSX)

S9 Table. Comparison of median values of traits in "cluster extreme" (top decile GRS) groups

and in all others with T2D in two biobanks. GRS, genetic risk score; T2D, type 2 diabetes.

(XLSX)

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 18 / 23

Page 19: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

Acknowledgments

We wish to thank the thousands of volunteers who participated in the studies included in this

work. Additionally, we acknowledge the T2D-GENES initiative.

Author Contributions

Conceptualization: Miriam S. Udler, Jaegil Kim, Marcin von Grotthuss, Christopher D.

Anderson, Gad Getz, Jose C. Florez.

Data curation: Miriam S. Udler, Marcin von Grotthuss, Sı́lvia Bonàs-Guarch, Joanne B. Cole,

Michael Boehnke, Markku Laakso, Gil Atzmon, Benjamin Glaser, Josep M. Mercader.

Formal analysis: Miriam S. Udler, Jaegil Kim, Marcin von Grotthuss, Sı́lvia Bonàs-Guarch,

Joshua Chiou, Kyle Gaulton.

Funding acquisition: Miriam S. Udler, Jose C. Florez.

Investigation: Miriam S. Udler.

Methodology: Miriam S. Udler, Jaegil Kim, Jason Flannick, Gad Getz.

Project administration: Miriam S. Udler.

Supervision: Josep M. Mercader, Kyle Gaulton, Jason Flannick, Gad Getz, Jose C. Florez.

Validation: Michael Boehnke, Markku Laakso, Gil Atzmon, Benjamin Glaser.

Visualization: Miriam S. Udler, Jason Flannick.

Writing – original draft: Miriam S. Udler, Jaegil Kim, Sı́lvia Bonàs-Guarch, Joanne B. Cole,

Joshua Chiou, Christopher D. Anderson, Josep M. Mercader, Kyle Gaulton, Jason Flannick,

Jose C. Florez.

Writing – review & editing: Miriam S. Udler, Jaegil Kim, Marcin von Grotthuss, Sı́lvia Bonàs-

Guarch, Joanne B. Cole, Joshua Chiou, Christopher D. Anderson, Michael Boehnke,

Markku Laakso, Gil Atzmon, Benjamin Glaser, Josep M. Mercader, Kyle Gaulton, Jason

Flannick, Gad Getz, Jose C. Florez.

References1. National Diabetes Statistics Report. Centers for Disease Control and Prevention, Services UDoHaH;

2017 [cited 2018 May 8]. Available from: https://www.cdc.gov/diabetes/data/statistics/statistics-report.

html.

2. American Diabetes A. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabe-

tes-2018. Diabetes Care. 2018; 41(Suppl 1):S13–S27. https://doi.org/10.2337/dc18-S002 PMID:

29222373.

3. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, et al. Identification of type 2 diabetes

subgroups through topological analysis of patient similarity. Sci Transl Med. 2015; 7(311):311ra174.

https://doi.org/10.1126/scitranslmed.aaa9364 PMID: 26511511; PubMed Central PMCID:

PMC4780757.

4. Ahlqvist E, Storm P, Karajamaki A, Martinell M, Dorkhan M, Carlsson A, et al. Novel subgroups of adult-

onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lan-

cet Diabetes Endocrinol. 2018. https://doi.org/10.1016/S2213-8587(18)30051-2 PMID: 29503172.

5. Mohlke KL, Boehnke M. Recent advances in understanding the genetic architecture of type 2 diabetes.

Hum Mol Genet. 2015; 24(R1):R85–92. https://doi.org/10.1093/hmg/ddv264 PMID: 26160912; PubMed

Central PMCID: PMC4572004.

6. Scott RA, Scott LJ, Magi R, Marullo L, Gaulton KJ, Kaakinen M, et al. An Expanded Genome-Wide

Association Study of Type 2 Diabetes in Europeans. Diabetes. 2017. https://doi.org/10.2337/db16-

1253 PMID: 28566273.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 19 / 23

Page 20: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

7. Bonas-Guarch S, Guindo-Martinez M, Miguel-Escalada I, Grarup N, Sebastian D, Rodriguez-Fos E,

et al. Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2

diabetes. Nat Commun. 2018; 9(1):321. https://doi.org/10.1038/s41467-017-02380-9 PMID: 29358691;

PubMed Central PMCID: PMC5778074.

8. Gaulton KJ, Ferreira T, Lee Y, Raimondo A, Magi R, Reschen ME, et al. Genetic fine mapping and

genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat Genet. 2015;

47(12):1415–25. https://doi.org/10.1038/ng.3437 PMID: 26551672; PubMed Central PMCID:

PMC4666734.

9. Mahajan A, Wessel J, Willems SM, Zhao W, Robertson NR, Chu AY, et al. Refining the accuracy of vali-

dated target identification through coding variant fine-mapping in type 2 diabetes. Nat Genet. 2018; 50

(4):559–71. https://doi.org/10.1038/s41588-018-0084-1 PMID: 29632382; PubMed Central PMCID:

PMC5898373.

10. Yaghootkar H, Scott RA, White CC, Zhang W, Speliotes E, Munroe PB, et al. Genetic evidence for a

normal-weight "metabolically obese" phenotype linking insulin resistance, hypertension, coronary artery

disease, and type 2 diabetes. Diabetes. 2014; 63(12):4369–77. https://doi.org/10.2337/db14-0318

PMID: 25048195; PubMed Central PMCID: PMC4392920.

11. Dimas AS, Lagou V, Barker A, Knowles JW, Magi R, Hivert MF, et al. Impact of type 2 diabetes suscep-

tibility variants on quantitative glycemic traits reveals mechanistic heterogeneity. Diabetes. 2014; 63

(6):2158–71. Epub 2013/12/04. https://doi.org/10.2337/db13-0949 PMID: 24296717; PubMed Central

PMCID: PMC4030103.

12. Tan VY, Fevotte C. Automatic relevance determination in nonnegative matrix factorization with the

beta-divergence. IEEE Trans Pattern Anal Mach Intell. 2013; 35(7):1592–605. https://doi.org/10.1109/

TPAMI.2012.240 PMID: 23681989.

13. Kim J, Mouw KW, Polak P, Braunstein LZ, Kamburov A, Tiao G, et al. Somatic ERCC2 mutations are

associated with a distinct genomic signature in urothelial tumors. Nat Genet. 2016; 48(6):600–6. https://

doi.org/10.1038/ng.3557 PMID: 27111033; PubMed Central PMCID: PMC4936490.

14. Kasar S, Kim J, Improgo R, Tiao G, Polak P, Haradhvala N, et al. Whole-genome sequencing reveals

activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolu-

tion. Nat Commun. 2015; 6:8866. https://doi.org/10.1038/ncomms9866 PMID: 26638776; PubMed

Central PMCID: PMC4686820.

15. Robertson AG, Kim J, Al-Ahmadie H, Bellmunt J, Guo G, Cherniack AD, et al. Comprehensive Molecu-

lar Characterization of Muscle-Invasive Bladder Cancer. Cell. 2017; 171(3):540–56 e25. https://doi.org/

10.1016/j.cell.2017.09.007 PMID: 28988769.

16. Morris AP, Voight BF, Teslovich TM, Ferreira T, Segre AV, Steinthorsdottir V, et al. Large-scale associ-

ation analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.

Nat Genet. 2012; 44(9):981–90. https://doi.org/10.1038/ng.2383 PMID: 22885922; PubMed Central

PMCID: PMC3442244.

17. Manning AK, Hivert MF, Scott RA, Grimsby JL, Bouatia-Naji N, Chen H, et al. A genome-wide approach

accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin

resistance. Nat Genet. 2012; 44(6):659–69. https://doi.org/10.1038/ng.2274 PMID: 22581228; PubMed

Central PMCID: PMC3613127.

18. Walford GA, Gustafsson S, Rybin D, Stancakova A, Chen H, Liu CT, et al. Genome-Wide Association

Study of the Modified Stumvoll Insulin Sensitivity Index Identifies BCL2 and FAM19A2 as Novel Insulin

Sensitivity Loci. Diabetes. 2016; 65(10):3200–11. https://doi.org/10.2337/db16-0199 PMID: 27416945;

PubMed Central PMCID: PMC5033262.

19. Prokopenko I, Poon W, Magi R, Prasad BR, Salehi SA, Almgren P, et al. A central role for GRB10 in reg-

ulation of islet function in man. PLoS Genet. 2014; 10(4):e1004235. https://doi.org/10.1371/journal.

pgen.1004235 PMID: 24699409; PubMed Central PMCID: PMC3974640.

20. Strawbridge RJ, Dupuis J, Prokopenko I, Barker A, Ahlqvist E, Rybin D, et al. Genome-wide association

identifies nine common variants associated with fasting proinsulin levels and provides new insights into

the pathophysiology of type 2 diabetes. Diabetes. 2011; 60(10):2624–34. https://doi.org/10.2337/db11-

0415 PMID: 21873549; PubMed Central PMCID: PMC3178302.

21. Soranzo N, Sanna S, Wheeler E, Gieger C, Radke D, Dupuis J, et al. Common variants at 10 genomic

loci influence hemoglobin A(1)(C) levels via glycemic and nonglycemic pathways. Diabetes. 2010; 59

(12):3229–39. https://doi.org/10.2337/db10-0502 PMID: 20858683; PubMed Central PMCID:

PMC2992787.

22. Saxena R, Hivert MF, Langenberg C, Tanaka T, Pankow JS, Vollenweider P, et al. Genetic variation in

GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat Genet. 2010; 42

(2):142–8. https://doi.org/10.1038/ng.521 PMID: 20081857.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 20 / 23

Page 21: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

23. Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, et al. New genetic loci

implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet. 2010; 42

(2):105–16. https://doi.org/10.1038/ng.520 PMID: 20081858.

24. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index

yield new insights for obesity biology. Nature. 2015; 518(7538):197–206. https://doi.org/10.1038/

nature14177 PMID: 25673413; PubMed Central PMCID: PMC4382211.

25. Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, Magi R, et al. New genetic loci link

adipose and insulin biology to body fat distribution. Nature. 2015; 518(7538):187–96. https://doi.org/10.

1038/nature14132 PMID: 25673412; PubMed Central PMCID: PMC4338562.

26. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common var-

iation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11):1173–

86. https://doi.org/10.1038/ng.3097 PMID: 25282103; PubMed Central PMCID: PMC4250049.

27. Horikoshi M, Beaumont RN, Day FR, Warrington NM, Kooijman MN, Fernandez-Tajes J, et al. Genome-

wide associations for birth weight and correlations with adult disease. Nature. 2016; 538(7624):248–52.

https://doi.org/10.1038/nature19806 PMID: 27680694; PubMed Central PMCID: PMC5164934.

28. van der Valk RJ, Kreiner-Moller E, Kooijman MN, Guxens M, Stergiakouli E, Saaf A, et al. A novel com-

mon variant in DCST2 is associated with length in early life and height in adulthood. Hum Mol Genet.

2015; 24(4):1155–68. https://doi.org/10.1093/hmg/ddu510 PMID: 25281659; PubMed Central PMCID:

PMC4447786.

29. Chu AY, Deng X, Fisher VA, Drong A, Zhang Y, Feitosa MF, et al. Multiethnic genome-wide meta-analy-

sis of ectopic fat depots identifies loci associated with adipocyte development and differentiation. Nat

Genet. 2017; 49(1):125–30. https://doi.org/10.1038/ng.3738 PMID: 27918534; PubMed Central

PMCID: PMC5451114.

30. Lu Y, Day FR, Gustafsson S, Buchkovich ML, Na J, Bataille V, et al. New loci for body fat percentage

reveal link between adiposity and cardiometabolic disease risk. Nat Commun. 2016; 7:10495. https://

doi.org/10.1038/ncomms10495 PMID: 26833246; PubMed Central PMCID: PMC4740398.

31. den Hoed M, Eijgelsheim M, Esko T, Brundel BJ, Peal DS, Evans DM, et al. Identification of heart rate-

associated loci and their effects on cardiac conduction and rhythm disorders. Nat Genet. 2013; 45

(6):621–31. https://doi.org/10.1038/ng.2610 PMID: 23583979; PubMed Central PMCID: PMC3696959.

32. Surakka I, Horikoshi M, Magi R, Sarin AP, Mahajan A, Lagou V, et al. The impact of low-frequency and

rare variants on lipid levels. Nat Genet. 2015; 47(6):589–97. https://doi.org/10.1038/ng.3300 PMID:

25961943; PubMed Central PMCID: PMC4757735.

33. Kilpelainen TO, Carli JF, Skowronski AA, Sun Q, Kriebel J, Feitosa MF, et al. Genome-wide meta-anal-

ysis uncovers novel loci influencing circulating leptin levels. Nat Commun. 2016; 7:10494. https://doi.

org/10.1038/ncomms10494 PMID: 26833098; PubMed Central PMCID: PMC4740377.

34. Dastani Z, Hivert MF, Timpson N, Perry JR, Yuan X, Scott RA, et al. Novel loci for adiponectin levels

and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 indi-

viduals. PLoS Genet. 2012; 8(3):e1002607. https://doi.org/10.1371/journal.pgen.1002607 PMID:

22479202; PubMed Central PMCID: PMC3315470 P Vollenweider received grant money from GlaxoS-

mithKline to fund the CoLaus study. The other authors declare no competing financial interests.

35. Kottgen A, Albrecht E, Teumer A, Vitart V, Krumsiek J, Hundertmark C, et al. Genome-wide association

analyses identify 18 new loci associated with serum urate concentrations. Nat Genet. 2013; 45(2):145–

54. https://doi.org/10.1038/ng.2500 PMID: 23263486; PubMed Central PMCID: PMC3663712.

36. Lemaitre RN, Tanaka T, Tang W, Manichaikul A, Foy M, Kabagambe EK, et al. Genetic loci associated

with plasma phospholipid n-3 fatty acids: a meta-analysis of genome-wide association studies from the

CHARGE Consortium. PLoS Genet. 2011; 7(7):e1002193. https://doi.org/10.1371/journal.pgen.

1002193 PMID: 21829377; PubMed Central PMCID: PMC3145614.

37. Guan W, Steffen BT, Lemaitre RN, Wu JHY, Tanaka T, Manichaikul A, et al. Genome-wide association

study of plasma N6 polyunsaturated fatty acids within the cohorts for heart and aging research in geno-

mic epidemiology consortium. Circulation Cardiovascular genetics. 2014; 7(3):321–31. https://doi.org/

10.1161/CIRCGENETICS.113.000208 PMID: 24823311; PubMed Central PMCID: PMC4123862.

38. Wu JH, Lemaitre RN, Manichaikul A, Guan W, Tanaka T, Foy M, et al. Genome-wide association study

identifies novel loci associated with concentrations of four plasma phospholipid fatty acids in the de

novo lipogenesis pathway: results from the Cohorts for Heart and Aging Research in Genomic Epidemi-

ology (CHARGE) consortium. Circulation Cardiovascular genetics. 2013; 6(2):171–83. https://doi.org/

10.1161/CIRCGENETICS.112.964619 PMID: 23362303; PubMed Central PMCID: PMC3891054.

39. Lemaitre RN, King IB, Kabagambe EK, Wu JH, McKnight B, Manichaikul A, et al. Genetic loci associ-

ated with circulating levels of very long-chain saturated fatty acids. J Lipid Res. 2015; 56(1):176–84.

https://doi.org/10.1194/jlr.M052456 PMID: 25378659; PubMed Central PMCID: PMC4274065.

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 21 / 23

Page 22: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

40. Malik R, Traylor M, Pulit SL, Bevan S, Hopewell JC, Holliday EG, et al. Low-frequency and common

genetic variation in ischemic stroke: The METASTROKE collaboration. Neurology. 2016; 86(13):1217–

26. https://doi.org/10.1212/WNL.0000000000002528 PMID: 26935894; PubMed Central PMCID:

PMC4818561.

41. Schunkert H, Konig IR, Kathiresan S, Reilly MP, Assimes TL, Holm H, et al. Large-scale association

analysis identifies 13 new susceptibility loci for coronary artery disease. Nat Genet. 2011; 43(4):333–8.

https://doi.org/10.1038/ng.784 PMID: 21378990; PubMed Central PMCID: PMC3119261.

42. Pattaro C, Teumer A, Gorski M, Chu AY, Li M, Mijatovic V, et al. Genetic associations at 53 loci highlight

cell types and biological pathways relevant for kidney function. Nat Commun. 2016; 7:10023. https://

doi.org/10.1038/ncomms10023 PMID: 26831199; PubMed Central PMCID: PMC4735748.

43. International Consortium for Blood Pressure Genome-Wide Association S, Ehret GB, Munroe PB, Rice

KM, Bochud M, Johnson AD, et al. Genetic variants in novel pathways influence blood pressure and

cardiovascular disease risk. Nature. 2011; 478(7367):103–9. https://doi.org/10.1038/nature10405

PMID: 21909115; PubMed Central PMCID: PMC3340926.

44. Wakefield J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies.

Am J Hum Genet. 2007; 81(2):208–27. https://doi.org/10.1086/519024 PMID: 17668372; PubMed Cen-

tral PMCID: PMC1950810.

45. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Meth-

ods. 2012; 9(3):215–6. https://doi.org/10.1038/nmeth.1906 PMID: 22373907; PubMed Central PMCID:

PMC3577932.

46. Varshney A, Scott LJ, Welch RP, Erdos MR, Chines PS, Narisu N, et al. Genetic regulatory signatures

underlying islet gene expression and type 2 diabetes. Proc Natl Acad Sci U S A. 2017; 114(9):2301–6.

https://doi.org/10.1073/pnas.1621192114 PMID: 28193859; PubMed Central PMCID: PMC5338551.

47. Stancakova A, Javorsky M, Kuulasmaa T, Haffner SM, Kuusisto J, Laakso M. Changes in insulin sensi-

tivity and insulin release in relation to glycemia and glucose tolerance in 6,414 Finnish men. Diabetes.

2009; 58(5):1212–21. https://doi.org/10.2337/db08-1607 PMID: 19223598; PubMed Central PMCID:

PMC2671053.

48. Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, et al. The genetic archi-

tecture of type 2 diabetes. Nature. 2016; 536(7614):41–7. https://doi.org/10.1038/nature18642 PMID:

27398621.

49. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of

64,976 haplotypes for genotype imputation. Nat Genet. 2016; 48(10):1279–83. https://doi.org/10.1038/

ng.3643 PMID: 27548312; PubMed Central PMCID: PMC5388176.

50. Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation

service and methods. Nat Genet. 2016; 48(10):1284–7. https://doi.org/10.1038/ng.3656 PMID:

27571263; PubMed Central PMCID: PMC5157836.

51. Permutt MA, Wasson JC, Suarez BK, Lin J, Thomas J, Meyer J, et al. A genome scan for type 2 diabe-

tes susceptibility loci in a genetically isolated population. Diabetes. 2001; 50(3):681–5. PMID:

11246891.

52. Blech I, Katzenellenbogen M, Katzenellenbogen A, Wainstein J, Rubinstein A, Harman-Boehm I, et al.

Predicting diabetic nephropathy using a multifactorial genetic model. PLoS ONE. 2011; 6(4):e18743.

https://doi.org/10.1371/journal.pone.0018743 PMID: 21533139; PubMed Central PMCID:

PMC3077408.

53. Smoller JW, Karlson EW, Green RC, Kathiresan S, MacArthur DG, Talkowski ME, et al. An eMERGE

Clinical Center at Partners Personalized Medicine. J Pers Med. 2016; 6(1). https://doi.org/10.3390/

jpm6010005 PMID: 26805891; PubMed Central PMCID: PMC4810384.

54. Yu S, Liao KP, Shaw SY, Gainer VS, Churchill SE, Szolovits P, et al. Toward high-throughput phenotyp-

ing: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform

Assoc. 2015; 22(5):993–1000. https://doi.org/10.1093/jamia/ocv034 PMID: 25929596; PubMed Central

PMCID: PMC4986664.

55. Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population

genetic studies. Nat Methods. 2013; 10(1):5–6. https://doi.org/10.1038/nmeth.2307 PMID: 23269371.

56. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference

for human genetic variation. Nature. 2015; 526(7571):68–74. https://doi.org/10.1038/nature15393

PMID: 26432245.

57. Consortium UK, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies

rare variants in health and disease. Nature. 2015; 526(7571):82–90. https://doi.org/10.1038/

nature14962 PMID: 26367797; PubMed Central PMCID: PMC4773891.

58. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. Genome-wide genetic data on

~500,000 UK Biobank participants. bioRxiv. 2017. https://doi.org/10.1101/166298

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 22 / 23

Page 23: Type 2 diabetes genetic loci informed by multi-trait ... · • In up to 17,365 individuals with T2D from four distinct studies, about 30% of individu-als have a genetic risk score

59. Eastwood SV, Mathur R, Atkinson M, Brophy S, Sudlow C, Flaig R, et al. Algorithms for the Capture

and Adjudication of Prevalent and Incident Diabetes in UK Biobank. PLoS ONE. 2016; 11(9):e0162388.

https://doi.org/10.1371/journal.pone.0162388 PMID: 27631769; PubMed Central PMCID:

PMC5025160.

60. Bonnefond A, Froguel P. Disentangling the Role of Melatonin and its Receptor MTNR1B in Type 2 Dia-

betes: Still a Long Way to Go? Current Diabetes Reports. 2017; 17(12):122. https://doi.org/10.1007/

s11892-017-0957-1 PMID: 29063374

61. Rosengren AH, Braun M, Mahdi T, Andersson SA, Travers ME, Shigeto M, et al. Reduced Insulin Exo-

cytosis in Human Pancreatic β-Cells With Gene Variants Linked to Type 2 Diabetes. Diabetes. 2012; 61

(7):1726–33. https://doi.org/10.2337/db11-1516 PMID: 22492527

62. Carrat GR, Hu M, Nguyen-Tu MS, Chabosseau P, Gaulton KJ, van de Bunt M, et al. Decreased

STARD10 Expression Is Associated with Defective Insulin Secretion in Humans and Mice. Am J Hum

Genet. 2017; 100(2):238–56. https://doi.org/10.1016/j.ajhg.2017.01.011 PMID: 28132686; PubMed

Central PMCID: PMC5294761.

63. Pappalardo Z, Gambhir Chopra D, Hennings TG, Richards H, Choe J, Yang K, et al. A Whole-Genome

RNA Interference Screen Reveals a Role for Spry2 in Insulin Transcription and the Unfolded Protein

Response. Diabetes. 2017; 66(6):1703–12. https://doi.org/10.2337/db16-0962 PMID: 28246293

64. Yaghootkar H, Lotta LA, Tyrrell J, Smit RA, Jones SE, Donnelly L, et al. Genetic Evidence for a Link

Between Favorable Adiposity and Lower Risk of Type 2 Diabetes, Hypertension, and Heart Disease.

Diabetes. 2016; 65(8):2448–60. https://doi.org/10.2337/db15-1671 PMID: 27207519; PubMed Central

PMCID: PMC5386140.

65. Speliotes EK, Yerges-Armstrong LM, Wu J, Hernaez R, Kim LJ, Palmer CD, et al. Genome-wide associ-

ation analysis identifies variants associated with nonalcoholic fatty liver disease that have distinct

effects on metabolic traits. PLoS Genet. 2011; 7(3):e1001324. https://doi.org/10.1371/journal.pgen.

1001324 PMID: 21423719; PubMed Central PMCID: PMC3053321.

66. Mahdessian H, Taxiarchis A, Popov S, Silveira A, Franco-Cereceda A, Hamsten A, et al. TM6SF2 is a

regulator of liver fat metabolism influencing triglyceride secretion and hepatic lipid droplet content. Proc

Natl Acad Sci U S A. 2014; 111(24):8913–8. https://doi.org/10.1073/pnas.1323785111 PMID:

24927523; PubMed Central PMCID: PMC4066487.

67. Kozlitina J, Smagris E, Stender S, Nordestgaard BG, Zhou HH, Tybjaerg-Hansen A, et al. Exome-wide

association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver dis-

ease. Nat Genet. 2014; 46(4):352–6. https://doi.org/10.1038/ng.2901 PMID: 24531328; PubMed Cen-

tral PMCID: PMC3969786.

68. Smagris E, Gilyard S, BasuRay S, Cohen JC, Hobbs HH. Inactivation of Tm6sf2, a Gene Defective in

Fatty Liver Disease, Impairs Lipidation but Not Secretion of Very Low Density Lipoproteins. J Biol

Chem. 2016; 291(20):10659–76. https://doi.org/10.1074/jbc.M116.719955 PMID: 27013658; PubMed

Central PMCID: PMC4865914.

69. Raimondo A, Rees MG, Gloyn AL. Glucokinase regulatory protein: complexity at the crossroads of tri-

glyceride and glucose metabolism. Curr Opin Lipidol. 2015; 26(2):88–95. https://doi.org/10.1097/MOL.

0000000000000155 PMID: 25692341; PubMed Central PMCID: PMC4422901.

70. Smagris E, BasuRay S, Li J, Huang Y, Lai KM, Gromada J, et al. Pnpla3I148M knockin mice accumu-

late PNPLA3 on lipid droplets and develop hepatic steatosis. Hepatology. 2015; 61(1):108–18. https://

doi.org/10.1002/hep.27242 PMID: 24917523; PubMed Central PMCID: PMC4262735.

71. Arntfield ME, van der Kooy D. beta-Cell evolution: How the pancreas borrowed from the brain: The

shared toolbox of genes expressed by neural and pancreatic endocrine cells may reflect their evolution-

ary relationship. Bioessays. 2011; 33(8):582–7. https://doi.org/10.1002/bies.201100015 PMID:

21681773.

72. Kushner JA, Ciemerych MA, Sicinska E, Wartschow LM, Teta M, Long SY, et al. Cyclins D2 and D1 are

essential for postnatal pancreatic beta-cell growth. Mol Cell Biol. 2005; 25(9):3752–62. https://doi.org/

10.1128/MCB.25.9.3752-3762.2005 PMID: 15831479; PubMed Central PMCID: PMC1084308.

73. Hishida T, Naito K, Osada S, Nishizuka M, Imagawa M. Crucial roles of D-type cyclins in the early stage

of adipocyte differentiation. Biochem Biophys Res Commun. 2008; 370(2):289–94. https://doi.org/10.

1016/j.bbrc.2008.03.091 PMID: 18374658.

74. Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, et al. Fine-mapping of an

expanded set of type 2 diabetes loci to single-variant resolution using high-density imputation and islet-

specific epigenome maps. bioRxiv. 2018. https://doi.org/10.1101/245506

Clustering analysis of T2D loci-trait associations points to disease mechanisms and subtypes

PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002654 September 21, 2018 23 / 23