university of groningen genetic etiology of type 2 ... · the aim of this thesis on the etiology of...

University of Groningen

Genetic etiology of type 2 diabetesErdos, Mike

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.

Document VersionPublisher's PDF, also known as Version of record

Publication date:2015

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):Erdos, M. (2015). Genetic etiology of type 2 diabetes: from gene identification to functional genomics. [S.l.]:[s.n.].

CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.

Download date: 24-05-2020

https://www.rug.nl/research/portal/en/publications/genetic-etiology-of-type-2-diabetes(de4bbc90-9d00-45a6-ae0b-00750478ac8b).html

https://www.rug.nl/research/portal/en/publications/genetic-etiology-of-type-2-diabetes(de4bbc90-9d00-45a6-ae0b-00750478ac8b).html

Genetic Etiology of Type 2 Diabetes:From Gene Identification to Functional Genomics

Michael Reynolds Erdos

ISBN

978-90-367-7595-3 (e-book) 978-90-367-7596-0

Cover figure

Graphical representation from gene identification to functional genomics depicts the identification of the CDKAL1 association with type 2 diabetes in the background manhattan plot of genome wide association transitioning to functional confirmation demonstrating intrachromosomal contacts of physically associated chromatin domains between CDKAL1and SOX4 genes by chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). The foreground illustrates the predicted model of the locus of transcription involving pancreatic islet specific stretch enhancers and the CDKAL1 and SOX4 genes. Credits: Ernesto del Aguila and Darryl Leja, Intramural Publications Support Office, National Human Genome Research Institute.

Genetic Etiology of Type 2 DiabetesFrom Gene Identification to Functional Genomics

PhD thesis

to obtain the degree of PhD at the University of Groningen on the authority of the

Rector Magnificus Prof. E. Sterken and in accordance with

the decision by the College of Deans.

This thesis will be defended in public on

Wednesday18 March 2015 at 14.30 hours

by

Michael Reynolds Erdos

born on 10 February 1956 in New Jersey, United States of America

SupervisorsProf. C. Wijmenga Prof. M.H. Hofker Prof. F.S. Collins

Assessment committeeProf. M.G. Netea Prof. H. Snieder Prof. B.H.R. Wolffenbuttel

Table of contents

Preface 7

Introduction 13

Chapter 1 The PPAR- 2 Pro12Ala variant: association with type 2 diabetes and trait differences 23

Chapter 2 High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools 31

Chapter 3 A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants 41

Chapter 4 Variations in the G6PC2/ABCB11 genomic region are associated with fasting glucose levels 81

Chapter 5 Common variant in MTNR1B associated with increased risk of type 2 diabetes and impaired early insulin secretion 97

Chapter 6 Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci 109

Discussion, Future Directions and Conclusion 149

Acknowledgements 159

References 163

Summaries 171

Short Biography and Publications 179

Preface

PREFACE

The aim of this thesis on the etiology of type 2 diabetes (T2D) is to describe efforts to define the genetic basis of the disease as a model for understanding the nature of complex disease, where there are many genes involved in contributing to the disease state at varying levels of penetrance as well as strong environmental impact. This journey began with gene discovery efforts in both familial and population-based studies, and continues with functional approaches to elucidate the biological mechanisms by which identified genetic loci influence disease risk. Future directions aimed toward realizing the goal of effective preventive, diagnostic and treatment modalities are described. The thesis consists of seven chapters. Chapter 1 describes the use of single nucleotide polymorphism (SNP) analysis in candidate gene studies to validate a non-synonymous coding polymorphism (P12A) in the peroxisome proliferation activating receptor gamma (PPARG2) gene. Chapter 2 presents a unique, high throughput and cost effective method of fine mapping analysis developed to rapidly screen candidate genomic regions of association by genotyping pooled DNA samples of case versus control subjects. With the advent of massively high throughput parallel genotyping arrays, Chapter 3 introduces genome-wide association study (GWAS) analysis on the Finnish-US Investigation of NIDDM Genetics (FUSION) cohort to identify loci associated with type 2 diabetes (T2D). In Chapters 4 and 5we present genetic associations with quantitative trait loci (QTL) that influence fasting glucose and insulin secretion to investigate mechanisms underlying the genes associated with T2D, adding a functional approach to support the association. To elucidate the potential causal genes in regions associated with T2D and T2D related quantitative traits, Chapter 6introduces the novel approach of assessing histone methylation states in pancreatic islets, the primary defective tissue in T2D, to identify chromatin structural domains that correlate with functional elements of noncoding regions of the genome including promoters, enhancers, transcribed genes and repressed genes. Finally, I discuss the current status of genetic association studies for T2D and efforts to refine regions of association by high-densitycustom genotyping and whole exome / genome sequencing. While making significant advances in the knowledge of the genetics of T2D, the identification of these largely non-coding risk variants does not readily lead to clear conclusions about which genes are actually affected, and whether the risk alleles lead to overexpression, underexpression, or misexpression in timing or tissue localization. Through the integration of T2D SNP association studies with epigenomic approaches that can delineate functional elements in noncoding genomic regions, and with ongoing gene expression analyses in T2D relevant tissues from patient samples, I describe an approach to determine the functional consequences of non-coding T2D risk alleles, and thereby ultimately to identify plausible therapeutic targets in genes and molecular pathways,

9

DEDICATION

There are many influences that contribute to achievement. The most significant are opportunity and support. I greatly appreciate the opportunity and support of many special mentors, colleagues, and friends throughout the years. Most significant is the encouragement and patience of Csilla Szabo without whose support I would never have had this opportunity of achievement.

11

Introduction

13

INTRODUCTION

Type 2 diabetes (T2D) affects over 347 million people worldwide, predominantly affecting low- and middle-income countries, and accounts for more than 80% of the total deaths due to diabetes. Nearly 1% of T2D affected people die each year. In 2005, the World Health Organization projected diabetes-related mortality would double by 2030 (‘WHO | Diabetes programme’, 2014).

In the United States over 8% of the population above 20 years of age have been diagnosed with T2D, with associated medical care costing over $174 billion annually. Reports of type 2 diabetes in children in the past were previously rare, but have increased worldwide as the prevalence of childhood obesity has been climbing. In some countries, it accounts for almost half of newly diagnosed cases in children and adolescents. If these trends continue, over 30% of adults in the United States will be diagnosed with T2D by the year 2050 (Figure 1) (‘CDC - National Diabetes Statistics Report, 2014 - Publications - Diabetes DDT’, 2014).

Figure 1. Prevalence of obesity and diabetes in the United States in 1994 and 2010. The US Center of Disease Control (CDC) presents statistical data for the prevalence of obesity and T2D by US state. The Finnish US Investigation of NIDDM (FUSION) study was initiated in 1994.

15

Type 2 diabetes results from the inability to effectively regulate glucose levels in the blood. T2D primarily affects metabolic tissues in the body and manifests as resistance to insulin action in muscle, liver and adipose tissues. Under normal physiological conditions the pancreatic islets secrete insulin to induce glucose uptake, predominately in the muscle, and influence glucose disposal by conversion to storage in other peripheral tissues such as liver and adipose. As glucose levels rise in the body the pancreatic islets compensate in response by increasing the amount of insulin secreted at a rate constant described as the disposition index. Repeated exposure to high levels of glucose results in overburdening the secretory response in the pancreatic islets. This leads to progressive stress on pancreatic beta-cells, a failure to compensate for the high glucose and insulin resistance in peripheral tissues, and ultimately results in beta-cell failure (Figure 2). By the time a person is diagnosed with T2D they have lost ~80% of their beta-cell function. More recent evidence indicates roles for other tissues in T2D including incretin deficiency in the gastrointestinal tract, hyperglucagonemia of the pancreatic islet alpha cells, increased glucose resorption in the kidney, and insulin resistance in the brain, indicating that the physiological changes in the development of T2D are much more complicated than previously perceived (DeFronzo, 2009).

Figure 2. Pathophysiology of type 2 diabetes. Increased circulating glucose and free fatty acids usually results in increased secretion of insulin, which regulates glucose production in the liver, increases glucose uptake by skeletal muscle and reduces free fatty acid release in adipose. Genetic predisposition and environmental factors leading to increased hyperglycemia and circulating free fatty acids resists insulin action and leads to beta cell toxicity, decreased insulin production and increased insulin resistance. Reviewed in (Stumvoll, Goldstein, & van Haeften, 2005).

16

Multiple lines of evidence support a significant hereditary contribution to T2D risk. There is a 3.5-fold increased incidence for first degree relatives of T2D subjects compared to the general middle–aged population. In the Finnish population, where our studies have primarily been focused, the T2D concordance in monozygotic twins is ~34% compared to ~16% in dizygotic twins (Kaprio et al., 1992). Nevertheless, identifying genetic variants affecting risk for type 2 diabetes (T2D) has been a formidable challenge for decades, complicated by lifestyle and environmental factors that play a major role in disease onset and progression (Tuomi et al., 2014). Thus, T2D is a prominent example of a common complex polygenic disease.

Gene Discovery – Linkage Analysis:Initially, complex disease studies were modeled after extremely successful familial genetic linkage studies such as those that identified the genes for Huntington’s disease (Gusella et al., 1983), Cystic Fibrosis (Tsui et al., 1985), and others. The FUSION (Finnish US Investigation of NIDDM) genetics study is an international collaboration with the goal to identify genetic variants contributing to T2D susceptibility. Families were originally selected in 1994 (Valle et al., 1998)based on index cases with age of onset 35-60 years, and with at least one affected sibling. Unaffected spouses and offspring were also ascertained for frequently sampled intravenous glucose tolerance tests (FSIGTs) to allow estimates of glucose- and insulin-related physiological traits. In addition, a control cohort of elderly individuals greater than 65 years of age with normal glucose tolerance was collected (Table 1).

Table 1: FUSION Study population characteristics:

Genome wide linkage analysis results were reported in 2000 using simple tandem (triplet and tetrad) repeat (STRs) polymorphic microsatellite markers to examine shared genetic regions between affected sibling pairs (ASPs). These identity by descent (IBD) analyses suggested regions linked to T2D on chromosomes 20, 14, 11, and 6 (Ghosh et al., 1999; Silander et al., 2004). While these analyses were instrumental in locating regions potentially linked with T2D the resolution of the linked regions was far too large to implicate specific disease genes.

17

The aim of the studies described herein is to refine the investigation of T2D genetics to the resolution of the gene, and to identify gene networks and molecular pathways responsible for T2D that might lead to the potential development of therapeutics for better disease management and prevention of associated complications.

Gene Discovery – Candidate Gene Association Studies:Large scale single nucleotide polymorphism (SNP) discovery in the 1990s enabled higher resolution case-control association studies capable of increasing resolution potentially to the level of the gene (Sachidanandam et al., 2001). By genotyping individual SNPs found in genes selected by specific criteria that were suspected to predispose to disease in T2D and normal subjects, differences in allele frequency between T2D and normal populations could be statistically tested for association to disease. Looking for association at the level of the single nucleotide may suggest that the gene being queried is implicated in the disease process (Schaid & Sommer, 1993).Typical candidate genes include PPARG2, a known target of the T2D therapeutic thiazolidinediones (Yen et al., 1997), and genes that were known to cause rare monogenic forms of T2D as in Maturity Onset Diabetes in the Young (MODY) genes (Bonnycastle, 2006). My own work in the candidate gene phase contributed to discovery of T2D associations with PPARG2 and HNF4A.

Gene Discovery – Genome-Wide Association Studies:Large collaborative efforts such as the SNP Consortium formed in 1999 accelerated SNP discovery to the point to enable an increasing number of biologically plausible candidate gene analyses, comparing allele frequency differences between case and control groups (Thorisson & Stein, 2003). The International HapMap Project, initiated in 2002, successfully catalogued most of the 10 million common SNPs in the human genome shared within and between African, Asian, and European populations (Gibbs et al., 2003). But scanning the whole genome for sites of variation associated with disease risk did not require testing all of those SNPs. Comparing genetic variation between large numbers of different people has identified regions of chromosomes where the variants are shared in a non-random way, referred to as “linkage disequilibrium”. Within these regions, which vary in size from a few kb to hundreds of kb, common SNP alleles tend to travel in lockstep with their neighbors, forming a haplotype (The International HapMap Consortium, 2005; Frazer et al., 2007). Genotyping any of the common variants in the shared segment imparts the same genetic information. This enabled the genotyping of far fewer than 10 million SNPs to assess the genetic association of these haplotypes with common disease. Large scale SNP discovery along with the advent of the HapMap project and the ability to perform massively parallel SNP genotyping expanded the ability to perform association studies to cover the whole genome. Genome wide association studies (GWAS) were undertaken by FUSION (L. J. Scott et al., 2007) and other large T2D collaborations (Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research et al., 2007; Burton et al., 2007; E. Zeggini et al., 2007). The statistical correction required for the increasing number of tests in GWAS made it clear that larger numbers of subjects would be required to undertake association studies at the genome-wide scale.

18

In order to overcome this penalty of multiple testing, increasing numbers of cases and controls were necessary. To achieve this aim, FUSION, the Broad Institute Diabetes Genetics Initiative (DGI), and the Wellcome Trust Case Control Consortium (WTCCC) agreed to combine results in a meta-analysis of each individual GWAS resulting in the identification of 11 loci associated with T2D at genome wide significance. These efforts subsequently led to the formation of the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium encompassing most of the largest case control studies with GWAS worldwide (Zeggini et al., 2008). The increasing numbers of studies joining the consortium contributed to significantly increased power allowing for the investigation of more rare alleles (Table 2).

Table 2: Summary of sample sets and SNPs assessed in the meta-analysis and replication of the DIAGRAM Consortium

In addition to diabetes affected status, many of these studies have collected considerable quantitative traits data, enabling the formation of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) (Dupuis, et. al. 2010 and Table 3) to investigate putative T2D-associated genetic loci by examining glucose and insulin related traits in unaffected subjects. This meta-analysis quickly identified 12 loci associated with fasting glucose related traits, but surprisingly little for insulin related traits, suggesting that T2D is much more heavily influenced by beta cell function than by insulin resistance.

19

Table 3. Summary information for studies compiled for the Meta-Analysis of Glucose and Insulin-related traits Consortium.

Although these unprecedented collaborative efforts resulted in increasing the discovery of T2D associated loci, they have thus far failed to capture more than a small portion of the heritability of the disease (Spencer, Hechter, Vukcevic, & Donnelly, 2011).

Type 2 diabetes GWAS and related quantitative traits have identified over 90 loci with genome wide significance in association with T2D and an even larger number of loci associated with obesity measures and with glucose and insulin related quantitative traits (Figure 3) (Grarup, Sandholt, Hansen, & Pedersen, 2014). The vast majority of these risk loci are, however, located in non-coding regions, suggesting that their effects are moderated by affected timing or level of gene expression. Given that effects of such non-coding functional elements (enhancers, insulators) can occur over long distances, the identification of the actual predisposing gene has only beenidentified in a few instances. Although risk loci are often named by the nearest gene, or the most plausible candidate, most of the associated loci contain many genes, and the gene chosen for locus labeling may have little or no supporting evidence for being functionally relevant.

20

Figure 3: Venn diagram of GWAS loci associated with T2D and T2D related quantitative traits. Intersection of genome-wide significant associations between T2D and five commonly measured T2D-related quantitative traits. Gene symbols represent the closest genes to the associated loci and may not be the actual causal gene.

Beyond GWAS studies:Given this circumstance, there is a pressing need to develop and assess strategies to identify the culprit genes and demonstrate their downstream effects. One approach is described here and methods of dissecting these associated loci for true cause and effect are more elaborately discussed.

In an effort to identify the specific genes responsible for T2D risk, we needed to understand the epigenomic landscape of non-coding DNA. Thus we chose to construct reference maps of chromatin structure based on a set of histone modifications that are well understood to correlate with function (Ernst et al., 2011) -- predicting promoters, enhancers, and repressed chromatin in T2D relevant tissues by chromatin immunoprecipitation (ChIP) experiments performed in pancreatic islets. Integrating T2D associated loci with these regulatory reference maps, as well as gene expression by whole transcriptome sequencing, we aim to identify the causal genes applying this functional strategy.

21

Chapter 1

The PPAR- 2 Pro12Ala variant: Association with type 2 diabetes and trait differences

Diabetes2001; 50(4): 886-890

23

P

The Peroxisome Poliferator–Activated Receptor-γ2 Pro12Ala Variant Association With Type 2 Diabetes and Trait Differences Michael R. Erdos,2 Julie A. Douglas,1 Richard M. Watanabe,3 Andi Braun,4 Cristy L. Johnston,4

Paul Oeth,4 Karen L. Mohlke,2 Timo T. Valle,5 Christian Ehnholm,5 Thomas A. Buchanan,6

Richard N. Bergman,7 Francis S. Collins,2 Michael Boehnke,1 and Jaakko Tuomilehto5,8

Recent studies have identified a common proline-to- alanine substitution (Pro12Ala) in the peroxisome proliferator–activated receptor-y2 (PPAR-y2), a nuclear receptor that regulates adipocyte differentiation and possibly insulin sensitivity. The Pro12Ala variant has been associated in some studies with diabetes-related traits and/or protection against type 2 diabetes. We examined this variant in 935 Finnish subjects, including 522 subjects with type 2 diabetes, 193 nondiabetic spouses, and 220 elderly nondiabetic control subjects. The frequency of the Pro12Ala variant was significantly lower in diabetic subjects than in nondiabetic subjects (0.15 vs. 0.21; P = 0.001). We also compared diabetes- related traits between subjects with and without the Pro12Ala variant within subgroups. Among diabetic subjects, the variant was associated with greater weight gain after age 20 years (P = 0.023) and lower triglyceride levels (P = 0.033). Diastolic blood pressure was higher in grossly obese (BMI >40 kg/m2) diabetic subjects with the variant. In nondiabetic spouses, the variant was associated with higher fasting insulin (P = 0.033), systolic blood pressure (P = 0.021), and diastolic blood pressure (P = 0.045). These findings support a role for the PPAR-y2 Pro12Ala variant in the etiology of type 2 diabetes and the insulin resistance syndrome. Diabetes 50:886 – 890, 2001

eroxisome proliferator–activated receptors (PPARs) are members of the nuclear hormone receptor family of transcription factors and are involved in adipocyte differentiation and gene

expression. They are also believed to play an important role in type 2 diabetes and diabetes-related traits, including insulin sensitivity and lipid and energy metabolism (1). In fact, studies have shown that ligands for PPAR-)', including both endogenous ones and those that are synthetic (e.g., thiazolidinedione drugs), stimulate adipogen- esis and increase insulin action (2). A common proline-to- alanine substitution at codon 12 (Pro12Ala) of exon B has been inconsistently associated with protection against type 2 diabetes and diabetes-related traits (3–14). These findings encouraged us to investigate the role of PPAR-)'2 in our sample from the Finnish population. The objective of our study was to examine whether the Pro12Ala variant was associated with type 2 diabetes and to examine the relationship between the Pro12Ala variant and diabetes- related traits among subgroups of diabetic and nondiabetic subjects.

The mean and standard deviation of selected trait values are given in Table 1 by subgroup. The entire sample of 935

From the 1Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan; the 2Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland; the 3Divi- sion of Biostatistics, Department of Preventative Medicine, Keck School of Medicine, University of Southern California, Los Angeles; 4Sequenom Inc., San Diego, California; the 5Department of Epidemiology and Health Promotion,

subjects consisted of 636 Pro/Pro subjects, 271 Pro/Ala

TABLE 1 Characteristics of the subjects by clinical subgroup

Diabetes and Genetic Epidemiology Unit, and the Department of Biochemis- try, National Public Health Institute, Helsinki, Finland; the 6Department of Diabetic

Spousal control

Elderly controlMedicine and the 7Department of Physiology and Biophysics, Keck School of

Medicine, University of Southern California, Los Angeles, California; and the 8Department of Public Health, University of Helsinki, Helsinki, Finland.

Address correspondence and reprint requests to Michael Boehnke, Univer- sity of Michigan, Department of Biostatistics, 1420 Washington Heights, Ann Arbor, Michigan 48109-2029. E-mail: [email protected].

Received for publication 12 June 2000 and accepted in revised form 29 December 2000.

J.A.D. and M.R.E. contributed equally to this work.Additional information can be found in an online appendix at www.

n 522 193 220 Sex (M:F) 288:234 62:131 106:114 Age at enrollment (years) 63.5 ± 7.5 61.4 ± 7.7 70.0 ± 0.3 Age at diagnosis (years) 50.0 ± 7.9 — — Diabetes duration (years) 13.6 ± 7.0 — — BMI (kg/m2) 30.0 ± 4.8 28.4 ± 4.5 27.0 ± 4.0 Waist-to-hip ratio 0.94 ± 0.08 0.88 ± 0.08 0.88 ± 0.08

diabetes.org/diabetes/appendix.asp. AIRG, acute insulin response to glucose; dBP, diastolic blood pressure; DI,

disposition index; FUSION, Finland–United States Investigation of Non– Insulin-Dependent Diabetes Mellitus Genetics; MALDI-TOF, matrix-assisted laser desorption/ionization time-of-flight; OGTT, oral glucose tolerance test;PPAR, peroxisome proliferator–activated receptor; PROBE, primer oligo base extension reaction; sBP, systolic blood pressure; SI, insulin sensitivity.

Fasting plasma glucose (mmol/l)

Fasting serum insulin (pmol/l)

Data are means ± SD.

10.7 ± 3.4 5.2 ± 0.7 5.0 ± 0.5

114.1 ± 71.4 75.6 ± 48.8 66.2 ± 34.8

25

PPAR-y2 Pro12Ala AND TYPE 2 DIABETES

Pro1

n frequency

subjects 522 — — control subjects 193 3.30 0.069 (mmHg) control subjects 220 11.72

or Unadjusted

Ala/Ala analysis analysis

subjects (362) (143) 0.052 0.023

(mmol/l) (373) (143) 0.047 0.033

(134) (84) 0.049 0.191

(pmol/l) (123) (68) 0.083 0.033 sBP† (mmHg) (122) (65) 0.014 0.021 dBP† (mmHg) (122) (65) 0.090 0.045

TABLE 2 Frequency of the PPAR-)'2 Pro12Ala variant by clinical subgroup

TABLE 4 dBP and sBP in diabetic subjects by presence/absence of the

2Ala variant and BMI

Pro/Pro Pro/Ala or

Ala/Ala

*Compared with diabetic subjects.

subjects, and 28 Ala/Ala subjects. The allele frequency of the Pro12Ala variant in the PPAR-)'2 gene was 0.15 among diabetic subjects, 0.19 among spousal control subjects, and 0.22 among elderly control subjects (Table 2), which mirrors the continuum of diabetes susceptibility across these subgroups. The frequency of the variant was significantly lower in diabetic subjects than in elderly control subjects (x2 = 11.72, df = 1, P < 0.0007) and marginally lower than in spousal control subjects (P = 0.069). Com- parison with combined spousal and elderly control subjects gave a significant association result (x2 = 10.60, df = 1, P = 0.001). A second independent sample of 263 Finnish diabetic subjects in our study subsequently confirmed the variant frequency of 0.15 in the original 522 diabetic subjects (data not shown). The observed genotype data were consistent with Hardy-Weinberg equilibrium.

Results for the quantitative traits were less compelling. Genotype-specific means for all traits for diabetic subjects, elderly control subjects, and spousal control subjects, respectively, are available in an online appendix (Tables A1–3) at www.diabetes.org/diabetes/appendix.asp. Table 3 shows the significant trait differences between subjects with and without the Pro12Ala variant by subgroup. In diabetic subjects, the presence of the variant was associated with greater weight change after 20 years of age (22.2 ± 14.0 vs. 19.5 ± 13.0 kg) and lower serum triglyceride levels (2.29 ± 1.65 vs. 2.68 ± 2.21 mmol/l). Both results were significant after adjustment for sex, age, and (for triglyceride levels) BMI (P = 0.023 and 0.033, respectively). There was a significant interaction (P = 0.038) between the variant and BMI for diastolic blood pressure (dBP); the variant was associated with higher dBP only among grossly obese diabetic subjects (Table 4). A similar trend was also observed for systolic blood pressure (sBP),

TABLE 3

78.0 ± 7.5 (3) 82.5 ± 4.9 (2) 20 < BMI < 25 82.9 ± 9.8 (39) 79.5 ± 10.7 (21) 25 < BMI < 30 83.5 ± 10.5 (167) 83.8 ± 11.0 (47) 30 < BMI < 35 85.5 ± 9.6 (109) 84.5 ± 10.8 (47) 35 < BMI < 40 87.2 ± 10.7 (34) 90.7 ± 11.6 (16) BMI > 40 86.6 ± 9.9 (14) 94.4 ± 7.1 (7)

sBP* (mmHg) BMI < 20 157.3 ± 44.2 (3) 148.0 ± 18.4 (2) 20 < BMI < 25 148.5 ± 23.5 (39) 145.4 ± 22.8 (21) 25 < BMI < 30 150.9 ± 21.9 (167) 155.1 ± 21.7 (47) 30 < BMI < 35 151.3 ± 20.6 (109) 155.8 ± 22.5 (47) 35 < BMI < 40 151.0 ± 21.3 (34) 156.3 ± 24.5 (16) BMI > 40 149.0 ± 19.8 (14) 166.9 ± 25.8 (7)

Data are means ± SD (n). *Mean of two measurements.

although the interaction was not statistically significant (P = 0.299).

Among elderly control subjects, only maximum lifetime weight was significantly associated with the Pro12Ala variant (P = 0.049), and it was no longer significant after adjustment for sex (P = 0.191) (Table 3). Among nondiabetic spousal control subjects, the variant was significantly associated with higher sBP (151.4 ± 25.5 vs. 142.5 ± 18.9 mm Hg) (P = 0.014). When both sBP and dBP and fasting serum insulin were adjusted for sex, age, and BMI, differences between spousal control subjects with and without the Pro12Ala variant remained and/or became significant (P = 0.021, 0.045, and 0.033, respectively) (Table 3). All three traits were significantly higher among subjects with the variant.

In our analysis, we found a significantly lower frequency of the Pro12Ala variant of the PPAR-)'2 gene in diabetic subjects than in nondiabetic subjects. The directionality of these highly significant (P = 0.001) findings is consistent with results from the studies of Deeb et al. (3), Mancini et al. (4), and Altshuler et al. (14), although the difference in allele frequencies in the second study failed to reach statistical significance. Coupled with these studies and the biological importance of PPAR)', these findings suggest a

Significant results by clinical subgroup and presence/absence of the PPAR-)'2 Pro12Ala variant

Elderly control subjects

Spousal control subjects

Data are means ± SD (n) unless otherwise indicated. Adjusted analysis P value includes adjustment for sex, age, and (except for weight-related traits). BMI; P values are not adjusted for multiple comparisons. *Current weight minus weight at age 20 years; †mean of two measurements.

26

J.A. DOUGLAS AND ASSOCIATES

link between the Pro12Ala variant of the PPAR-)'2 gene and the pathogenesis of type 2 diabetes. The increased frequency of the variant in nondiabetic subjects would seem to suggest that the Pro12Ala variant confers some protective effect against diabetes.

Despite the increased frequency of the Pro12Ala variant among elderly control subjects, we failed to find any significant trait associations within this subgroup. Instead, we observed weak but significant associations between the Pro12Ala variant and traits characteristic of the insulin resistance syndrome in both diabetic and nondiabetic subjects. For example, greater weight gain was associated with the Pro12Ala variant in diabetic subjects, whereas higher fasting insulin, sBP, and dBP were associated with the variant in nondiabetic spouses. It should be emphasized that, in contrast to the spousal control subjects, the elderly control subjects represent a quite distinct subgroup of nondiabetic subjects who are unlikely to ever develop type 2 diabetes. As such, they are unlikely to carry the cluster of susceptibility genes that may interact with variants in PPAR-)'2 to result in the insulin resistance syndrome phenotype. The spousal control subjects are somewhat younger and remain at risk for developing type 2 diabetes during their lifetime.

Alterations in functional characteristics of the PPAR-)'2 gene induced by the Ala isoform may be partly responsible for the manifestation of some characteristics of the insulin resistance syndrome. Deeb et al. (3) identified lowered transactivation capacity and reduced stimulation of PPAR-)' target genes as a potential molecular mechanism underlying the association of the Pro12Ala variant with lower BMI and increased insulin sensitivity, a hypothesis consistent with their observations in Finnish subjects. Although this hypothesis may appear to be at odds with (or at least not supported by) our trait findings, several points should be clarified. First, the middle-aged subjects in the study by Deeb et al. (3) were much younger and leaner than our nondiabetic spouses and elderly control subjects. Second, although the elderly subjects from both studies were better matched, we could not parallel their genotype-based analysis because of insufficient numbers of Pro12Ala homozygotes. If the Pro/Ala and Ala/Ala subjects from their study had been pooled, it is unlikely that they would have observed significant trait differences because trends within their elderly subjects were inconsistent (e.g., fasting insulin was highest for Pro/Ala heterozygotes).

Consistent with at least one report of a differential effect of the PPAR-)'2 Pro12Ala variant in the lean and obese states (12), we also found an interaction between BMI and the variant for dBP in the diabetic subjects. Among se-verely obese subjects, those with the Pro12Ala variant had substantially higher blood pressure. Higher values for sBP and dBP were also associated with the variant among nondiabetic spousal control subjects, though there was no evidence for an interaction between BMI and the Pro12Ala variant. These associations are of interest, given the recent report by Barroso et al. (13) of three type 2 diabetic subjects with early-onset hypertension and polymorphisms in the PPAR-)'2 gene, suggesting that this receptor is important in both blood pressure and glucose homeostasis.

In summary, we found that the Pro12Ala variant of the PPAR-)'2 gene was associated with protection against type 2 diabetes in Finnish subjects, a finding consistent with several reports in the literature (3,4,14). Because we only screened for this particular variant, we cannot exclude the role of other PPAR-)'2 variants or variants in nearby genes, possibly in linkage disequilibrium with the Pro12Ala variant. Further studies, including functional analyses, will be required to fully understand the role of this gene in type 2 diabetes. Our data suggest that the PPAR-)'2 Pro12Ala variant has variable effects among subgroups of individuals with different levels of diabetes risk.

RESEARCH DESIGN AND METHODS The Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics (FUSION) Study is an international collaborative effort to map and clone genes predisposing to type 2 diabetes and related traits in Finnish subjects. The FUSION study design and family material have been described previously (15). For the present investigation, our sample included 522 unrelated subjects with type 2 diabetes, 193 nondiabetic spouses of a diabetic subject or his/her affected sibling, and 220 unrelated elderly nondiabetic control subjects. Diabetes was diagnosed by World Health Organization (16) criteria. Spouses had a single normal oral glucose tolerance test (OGTT). Elderly control subjects had normal glucose tolerance at ages 65 and 70 years.

A total of 14 traits were analyzed on all subjects: BMI, waist circumference, waist-to-hip ratio, current weight, maximum lifetime weight, fasting plasma glucose, fasting serum insulin, total cholesterol, HDL cholesterol, HDL ratio (HDL cholesterol/total cholesterol), LDL cholesterol, triglycerides, sBP, and dBP. Values for sBP and dBP were each determined as the mean of two measurements. Seven additional traits were ascertained on diabetic subjects: weight at 20 years of age, change in weight after 20 years of age, maximum lifetime weight change after 20 years of age, age at diagnosis of diabetes, diabetes duration, age at which insulin treatment started (if applicable), and fasting plasma C-peptide concentrations. In addition, glucose and insulin concentrations 2 h after OGTT were analyzed in nondiabetic subjects, whereas the insulin sensitivity index (SI), the glucose effectiveness index, the acute insulin response to glucose (AIRG), and the disposition index (DI) were analyzed (DI = SI X AIRG) only in the nondiabetic spouses; the latter analyses used tolbutamide-modified frequently sampled intravenous glucose tolerance tests and minimal model analysis (17). Glucose, insulin, C-peptide, and lipid concentrations were assayed using standard methods (15). Genotyping by matrix-assisted laser desorption/ionization time-of- flight mass spectrometry. The PPAR-)'2 Pro12Ala variant was analyzed by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. A 69-bp fragment containing the Pro12Ala variant site was amplified by polymerase chain reaction from 20 ng genomic DNA using 25 pmol forward primer 5'-GCTGTTATGGGTGAAACTCTG, 2 pmol of a universal sequence-tailed reverse primer 5'-AGCGGATAACAATTTCACACAGGCAGTG- TATCAGTGAAGGAATCG, and 10 pmol of a biotinylated universal primer 5'-biotin-AGCGGATAACAATTTCACACAGG under standard reaction conditions (Fig. 1A). After 15 min of denaturation at 95°C, 55 cycles (5 s at 95°C, 20 s at 53°C, and 30 s at 72°C) were performed. To recover the single-stranded DNA template, the product was immobilized on streptavidin-coated magnetic beads (Dynal, Great Neck, NY), washed with 10 mmol/l Tris-HCl at pH 8.0, denatured in 50 J.l 0.1 mol/l NaOH, and washed again with 10 mmol/l Tris-HCl.

The primer oligo base extension reaction (PROBE) was performed by the addition of 20 pmol extension primer 5'-TCTGGGAGATTCTCCTATTGAC under conditions similar to those previously described (18). The extension reaction products were applied to a SpectroChip (Sequenom, San Diego, CA) prespotted with a matrix of 3-hydroxypicolinic acid using a Spectrojet piezoelectric nanoliter dispensing system (19). A modified Bruker Biflex III MALDI-TOF mass spectrometer (DNA MassArray; Sequenom) was used to determine genotypes by the appearance of peaks corresponding to the expected extension product masses (Fig. 1B). Statistical analyses. Associations of the Pro12Ala variant of the PPAR-)'2 gene between diabetic subjects and both nondiabetic spouses and elderly control subjects were examined by x2 tests of independence. Trait differences within diabetic, elderly control, or spousal control subgroups were examined by analysis of variance. Initially, we tested whether trait means differed significantly among subjects with the Pro/Pro, Pro/Ala, and Ala/Ala genotypes. Due to the small number of individuals with the Ala/Ala genotype, we subsequently tested whether the trait means differed between subjects with and without the Pro12Ala variant (Pro/Ala and Ala/Ala versus Pro/Pro). All

27

PPAR-y2 Pro12Ala AND TYPE 2 DIABETES

FIG. 1. Genotype analysis by MALDI-TOF spectrometry. A: PROBE reaction. The region containing the PPARy2 Pro12Ala (CCA->GCA) variant is amplified with a biotinylated primer to enable purification of the single-stranded template. Next, the PROBE primer anneals to the template and is extended. When the single nucleotide polymorphism (SNP) is C, the probe is extended by one nucleotide, dideoxy-CTP. When the SNP is G, the probe is extended by two nucleotides, deoxy-GTP, and dideoxy-CTP. B: Mass spectrometry profiles of primer extension products. Peaks at 6,989.6 and 7,318.8 Da correspond to the mass of the probe primer extended by one or two nucleotides, respectively. Genotypes of the spectra are 1) CC, 2) CG, and 3) GG. The mass of the unextended PROBE primer is indicated at 6,716.4 Da, but in these examples, none is detected.

analyses were performed with and without adjustment for covariates, including sex, age, and BMI. Preselected interactions between the variant and sex or BMI were also tested. Standard regression diagnostics were computed to examine the adequacy of model assumptions, and traits were transformed to approximate normality when necessary. P values <0.05 were considered statistically significant. No adjustments for multiple comparisons were made. We excluded from the analyses any subject who, on the day of their examinations, took medications that could influence the trait of interest. We also excluded subjects whose diabetic status was uncertain and those with a first-degree relative with type 1 diabetes.

ACKNOWLEDGMENTS The FUSION study is made possible by intramural funds from the National Human Genome Research Institute (Project number OH95-C-N030), by grants from the Finn-ish Academy (38387 and 46558), and by National Insti- tiutes of Health grants HG00040 (J.A.D.), HG00376 (M.B.), DK09525 (R.M.W.), DK27619, and DK29867 (R.N.B.). Cur- rently, J.A.D. is supported by a University of Michigan Rackham Predoctoral Fellowship, and R.M.W. is supported by a Career Development Award from the American Diabetes Association.

We wish to thank all of the subjects for their invaluable contribution to the FUSION study. We also gratefully acknowledge Peter Chines for his exceptional work in pre- paring the data. Family studies were approved by institutional review boards at the National Institutes of Health (assurance number SPA S-5737-05) and at the National Public Health Institute in Helsinki, Finland.

REFERENCES 1. Auwerx J: PPAR)', the ultimate thrifty gene. Diabetologia 42:1033–1049,

1999

2. Spiegelman BM: PPAR-)': adipogenic regulator and thiazolidinedione re-ceptor (Review). Diabetes 47:507–514, 1998

3. Deeb SS, Fajas L, Nemoto M, Pihlajamaki J, Mykkanen L, Kuusisto J, Laakso M, Fujimoto W, Auwerx J: A Pro12Ala substitution in PPAR)'2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nat Genet 20:284 –287, 1998

4. Mancini FP, Vaccaro O, Sabatino L, Tufano A, Rivellese AA, Riccardi G, Colantuoni V: Pro12Ala substitution in the peroxisome proliferator-activated receptor-)'2 is not associated with type 2 diabetes. Diabetes 48:1466 – 1468, 1999

5. Ringel J, Engeli S, Distler A, Sharma AM: Pro12Ala missense mutation of the peroxisome proliferator activated receptor )' and diabetes mellitus. Biochem Biophys Res Commun 254:450 – 453, 1999

6. Clement K, Hercberg S, Passinge B, Galan P, Varroud-Vial M, Shuldiner AR, Beamer BA, Charpentier G, Guy-Grand B, Froguel P, Vaisse C: The Pro115Gln and Pro12Ala PPAR )' gene mutations in obesity and type 2 diabetes. Int J Obes 24:391–393, 2000

7. Meirhaeghe A, Fajas L, Helbecque N, Cottel D, Auwerx J, Deeb SS, Amouyel P: Impact of the peroxisome proliferator activated receptor )'2 Pro12Ala polymorphism on adiposity, lipids, and non-insulin-dependent diabetes mellitus. Int J Obes 24:195–199, 2000

8. Ek J, Urhammer SA, Sorensen TI, Andersen T, Auwerx J, Pedersen O: Homozygosity of the Pro12Ala variant of the peroxisome proliferation- activated receptor-)'2 (PPAR-)'2): divergent modulating effects on body mass index in obese and lean Caucasian men. Diabetologia 42:892– 895, 1999

9. Beamer BA, Yen CJ, Andersen RE, Muller D, Elahi D, Cheskin LJ, Andres R, Roth J, Shuldiner AR: Association of the Pro12Ala variant in the peroxisome proliferator–activated receptor-)'2 gene with obesity in two Caucasian populations. Diabetes 47:1806 –1808, 1998

10. Koch M, Rett K, Maerker E, Volk A, Haist K, Deninger M, Renn W, Haring HU: The PPAR)'2 amino acid polymorphism Pro 12 Ala is prevalent in offspring of type II diabetic patients and is associated to increased insulin sensitivity in a subgroup of obese subjects. Diabetologia 42:758 –762, 1999

11. Cole SA, Mitchell BD, Hseuh W, Pineda P, Beamer BA, Shuldiner AR, Comuzzie AG, Blangero J, Hixson JE: The Pro12Ala variant in peroxisome proliferator-activated receptor-)'2 (PPAR-)'2) is associated with measures of obesity in Mexican Americans. Int J Obes 24:522–524, 2000

28

J.A. DOUGLAS AND ASSOCIATES

12. Ristow M, Muller-Wieland D, Pfeiffer A, Krone W, Kahn CR: Obesity associated with a mutation in a genetic regulator of adipocyte differentiation. N Engl J Med 339:953–959, 1998

13. Barroso I, Gurnell M, Crowley VE, Agostini M, Schwabe JW, Soos MA, Maslen GL, Williams TD, Lewis H, Schafer AJ, Chatterjee VK, O’Rahilly S: Dominant negative mutations in human PPAR)' associated with severe insulin resistance, diabetes mellitus and hypertension. Nature 402:880 – 883, 1999

14. Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl M, Nemesh J, Lane CR, Schaffner SF, Bolk S, Brewer C, Tuomi T, Gaudet D, Hudson TJ, Daly M, Groop L, Lander ES: The common PPAR)' Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 26:76 – 80, 2000

15. Valle T, Tuomilehto J, Bergman RN, Ghosh S, Hauser ER, Eriksson J, Nylund SJ, Kohtamaki K, Toivanen L, Vidgren G, Tuomilehto-Wolf E,

Ehnholm C, Blaschak J, Langefeld CD, Watanabe RM, Magnuson V, Ally DS, Hagopian WA, Ross E, Buchanan TA, Collins F, Boehnke M: Mapping genes for NIDDM: design of the Finland-United States Investigation of NIDDM (FUSION) Genetics Study. Diabetes Care 21:949 –958, 1998

16. World Health Organization: Diabetes Mellitus: Report of a WHO Study Group. Geneva, World Health Org., 1985 (Tech. Rep. Ser. no. 727)

17. Bergman RN: Lilly Lecture 1989: Toward physiological understanding of glucose tolerance: minimal-model approach. Diabetes 38:1512–1527, 1989

18. Braun A, Little DP, Ko ster H: Detecting CFTR gene mutations by using primer oligo base extension and mass spectrometry. Clin Chem 43:1151– 1158, 1997

19. Little DP, Cornish TJ, O’Donnell MJ, Braun A, Cotter RJ, Ko ster H: MALDI on a chip: analysis of arrays of low-femtomole to subfemtomole quantities of synthetic oligonucleotides and DNA diagnostic products dispensed by a piezoelectric pipette. Anal Chem 69:4540 – 4546, 1997

29

Chapter 2

High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools

Proceedings of the National Academy of Science, USA. 2002;99(26):16928-33

31

A

High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools Michael R. Erdos*†, Karen L. Mohlke*†, Laura J. Scott†‡, Tasha E. Fingerlin§, Anne U. Jackson‡, Kaisa Silander*, Pablo Hollstein*, Michael Boehnke‡¶, and Francis S. Collins*¶

*Genome Technology Branch, National Human Genome Research Institute, Bethesda, MD 20892; and Departments of ‡Biostatistics and §Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109

Contributed by Francis S. Collins, October 31, 2002

To facilitate positional cloning of complex trait susceptibility loci, we are investigating methods to reduce the effort required to identify trait-associated alleles. We examined primer extension analysis by matrix-assisted laser desorption/ionization time-of- flight mass spectrometry to screen single-nucleotide polymorphisms (SNPs) for association by using DNA pools. We tested whether this method can accurately estimate allele frequency differences between pools while maintaining the high-throughput nature of assay design, sample handling, and scoring. We follow up interesting allele frequency differences in pools by genotyping individuals. We tested DNA pools of 182, 228, and 499 individuals using 16 SNPs with minor allele frequencies 0.026 – 0.486 and allele frequency differences 0.001– 0.108 that we had genotyped previously on individuals and 381 SNPs that we had not. Precision, as measured by the average standard deviation among 16 semi- dependent replicates, was 0.021 ± 0.011 for the 16 SNPs and 0.018 ± 0.008 for the 291/381 SNPs used in further analysis. For the 16 SNPs, the average absolute error in predicting allele frequency differences between pools was 0.009; the largest errors were 0.031, 0.028, and 0.027. We determined that compensating for unequal peak heights in heterozygotes improved precision of allele frequency estimates but had only a very minor effect on accuracy of allele frequency differences between pools. Based on these data and assuming pools of 500 individuals, we conclude that at significance level 0.05 we would have 95% (82%) power to detect population allele frequency differences of 0.07 for control allele frequencies of 0.10 (0.50).

ssociation studies provide a powerful approach to identify the DNA variants underlying complex traits (1). Currently,

association studies can be especially useful for narrowing a complex trait candidate inter val identified by linkage analysis (2, 3), although improved genotyping technology and a map of single-nucleotide polymorphisms (SNPs) identifying the common haplotypes in the human genome may enable association studies of loci spanning the entire genome. A rate-limiting step for association studies is to obtain the large number of genotypes needed. Currently, a linkage region expected to contain a complex trait locus typically spans 10 –20 Mb, and even with a priori knowledge of the linkage disequilibrium between DNA variants, thousands of densely spaced SNPs with a range of allele frequencies may need to be screened (4). In addition, sample sizes of hundreds or even thousands of individuals may be required to have sufficient power to detect loci with modest effect.

A reliable screening method to identify SNPs associated with disease without genotyping all individuals would be efficient and economical. Screening SNPs by typing a limited number of DNA pools representing cases and controls in principle requires vastly fewer genotypes for each SNP, reducing labor and reagent costs. Genotyping cost becomes essentially independent of sample size, allowing larger, more powerful samples to be studied. In addition, the amount of DNA used from each person for each

genotype can be dramatically reduced, an important consideration when DNA samples are limited.

An optimal technique to screen SNPs for association would accurately and precisely identify SNPs that show a difference between cases and controls. Because the major experimental question is not the absolute allele frequencies, but whether there are allele frequency differences between cases and controls, a consistent under- or overestimate of pooled allele frequencies, if modest or correctable, would not preclude a method from use. Several methods for typing SNPs in pooled DNA, including mass spectrometr y, have been described (5–21). These methods currently have var ying suitability to a high-throughput setting. For many of these methods, the precision and accuracy in estimating allele frequency differences between pools remain to be established, as does the variability associated with pool formation and each stage of the genotyping process.

Primer extension analysis by mass spectrometr y is a potentially attractive method for allele frequency estimation based on pools because it can be easily automated. Design of assays based only on local sequence allows automated assay design with uniform assay conditions. This similarity of assay conditions permits extensive use of robotics, which limits human error. Mass spectrometr y data collection is fast and automated, based on the size of extended products.

The precision of mass spectrometr y has been evaluated in a limited number of studies (19 –21). Ross and coworkers (19) tested the quantitative range and detection limits of the technique and were able to quantitate allele frequencies as low as 0.05. Buetow et al. (20) used 81 assays to evaluate precision; when each primer extension reaction was dispensed four times or when each PCR was repeated four times, they obser ved a median standard deviation (SD) of 0.016 or 0.017, respectively. Werner et al. (21) obser ved a median SD of 0.017 in artificial pools and 0.016 – 0.024 for estimates from pools of 94 –280 individuals.

We have extended the work of previous studies by assessing the ability of mass spectrometr y to reliably estimate allele frequencies in pools and allele frequency differences between pools and by estimating the sources of variability in these estimates. We performed primer extension assays and used SPECTROTYPER software (Sequenom, San Diego) to quantitate allele frequency estimates from relative peak areas. We compared estimated allele frequencies and allele frequency differences to those obtained from typing individual DNA samples for 16 SNPs in three DNA pools of laborator y interest. We also assessed precision in allele frequency estimates for 381 additional SNPs assayed only in pools. We used the estimates of the variability from PCR and primer extension, and product dispensing and mass spectrometr y to estimate the power of pooled

Abbreviation: SNP, single-nucleotide polymorphism. †K.L.M., M.R.E., and L.J.S. contributed equally to this work. ¶To whom correspondence may be addressed. E-mail: [email protected] or [email protected].

33

genotyping and to compare its power with that for genotyping individual samples. The data demonstrate that this method has the necessar y characteristics to be used successfully for pooled genotype analysis.

Methods Study Samples. The DNA samples used are from participants in the Finland-United States Investigation of Non-Insulin Depen- dent Diabetes Mellitus Genetics (FUSION) Study, in which we seek to identify genetic variants that predispose to type 2 diabetes or are responsible for variability in diabetes-related quantitative traits. Families were enrolled based on sibling pairs affected with type 2 diabetes (22); controls included 194 nondiabetic spouses of affected family members and 231 unrelated elderly controls. Informed consent was obtained from all participants.

Construction of Case and Control DNA Pools. We selected samples to create one DNA pool representing cases with type 2 diabetes and two pools representing controls. We selected one affected individual from each of 525 families for a pool designated F1, 194 unaffected spouses for a pool designated SP, and 231 unrelated elderly nondiabetic controls for a pool designated EC. Based on an initial quantitation by spectrophotometer (Beckman DU-640), each sample was diluted to an expected concentration of =50 ng/11l and requantitated by using a PicoGreen assay (Molecular Probes) on a f luorometer (Molecular Devices Spec- traMAXGeminiXS). Four independent measurements were performed by using the low range standard protocol and the concentrations were averaged. If the independent measurements varied from the mean by >10%, the measurement was repeated. Samples that were determined to have less than the required amount of DNA for each pool were omitted. Based on the concentrations of individual samples, we calculated the volumes needed to obtain equimolar amounts of each sample. We combined samples to create subpools of =100 individuals and adjusted the concentration of each subpool to 50 ng/11l by using the same criteria for quantitation as the individual samples. The appropriate subpools were combined and diluted to 10 ng/11l before use. The final pool sizes were 499 individuals for F1, 182 for SP and 228 for EC.

PCR, Primer Extension Reactions, and Mass Spectrometry. Most PCR primers and primer extension assays were designed by using SPECTRODESIGN software (Sequenom) specifying an optimal PCR product of 100 nucleotides with a range of 60 – 400. SNP assays were designed to generate extension products of different masses, usually by incorporating one dideoxynucleotide or one deoxynucleotide and one dideoxynucleotide, depending on the SNP allele. Primer sequences for the 16 SNPs typed on pools and individuals are available in Table 3, which is published as supporting information on the PNAS web site, www.pnas.org. A set of 381 additional SNPs were typed in pools and on 7–11 individuals as part of a large-scale SNP screening project. Assay designs were uploaded w ith SPECTROIMPORTER software (Sequenom). We used 20 ng of genomic DNA as template in 20-11l PCR, all of which was used for a magnetic-bead based isolation of template before performing primer extension reactions using standard conditions as described (23). We used a Spectrojet piezoelectric nanoliter dispensing system (Sequenom) to apply the extension products onto chips prespotted with a matrix of 3-hydroxypicolinic acid (24) and a modified Bruker Bif lex III matrix-assisted laser desorption/ionization time-of- f light (MA LDI-TOF) mass spectrometer (Sequenom) to determine genotypes by the appearance of peaks corresponding to the expected extension product masses. To minimize variability caused by depurination of extension product peaks, we scanned chips within 24 h after dispensing extension products, although

we do not know whether depurination would be unequal between pools and introduce variability.

To genotype individuals for the 16 SNPs, we dispensed primer extension products one time each and set mass spectrometr y SPECTROACQUIRE software (Sequenom) to collect sets of 20 spectra until a genotype could be called unambiguously or five sets of 20 spectra were collected, whichever came first. The 16 SNPs were on average 97% successful (range 94 –98%) on the 909 individuals comprising the pools. We routinely performed a limited manual review of spectra to detect and remove ques- tionable individual genotype calls, usually calls with low signal intensity. We genotyped 4 of ever y 90 samples in duplicate. We have obser ved an error rate among duplicates of 0.03%.

To genotype pools, we performed four replicate PCRs for each SNP on each pool and dispensed primer extension products onto four spots of a 384-spot chip, yielding a total of 16 obser vations (four PCRs X four spots per PCR) for each pool for each SNP. We set the mass spectrometr y SPECTROACQUIRE software to collect five sets of 20 spectra and raster to all positions. We obtained peak areas from SPECTROTYPER software by integration of the area under the spectral peak at the expected mass of the extension product.

Review of SNP Assays Tested for Association by Using DNA Pools. When we tested the 381 novel SNPs on DNA pools, we applied the following criteria to remove poor quality data. We removed spectra with signal-to-noise ratios below 3.5 or with a peak height below 1.0 intensity unit. We removed SNPs for which less than 8 of 16 possible obser vations remained for any pool or for which the SD of any pool was greater than 0.05. At the same time that we determined SNP genotypes in DNA pools, we genotyped one negative control sample and 7–11 individual samples to help detect assay artifacts. For each individual, we performed a single PCR and dispensed the extension product onto four spots on a chip, yielding a total of four obser vations per individual. To mimic a high-throughput procedure, we did not select individuals by prior knowledge of genotypes. We discarded SNPs from further analysis if all individuals were heterozygotes, although we recognized that as many as one reliable SNP assay in 2n, where n is the number of individuals successfully tested, may show all heterozygotes by chance. We also discarded SNPs in which heterozygotes showed widely skewed peak ratios (peak area of one allele at least four times greater than peak area of the other allele), because our experience, as well as that of others (25, 26), suggests that these SNPs are difficult to score correctly. Finally, we discarded SNPs for which obser ved allele frequencies in heterozygotes differed dramatically from one another (SD > 0.10), because we have found such assays often fail tests of Hardy–Weinberg equilibrium.

Of the original 381 SNPs, 90 (23.6%) failed to meet one or more of the above criteria. A total of 58 (15.2%) had a pool with <8 successful obser vations, 11 (2.9%) had a pool with allele frequency SD >0.05, 17 (4.5%) had all heterozygous individuals, 11 (2.9%) had severely skewed average heterozygous peak ratios, and 22 (5.8%) had heterozygotes with dramatically different peak ratios.

Statistical Analysis. Given four PCRs and four spots per PCR, up to 16 obser vations were available to estimate the allele frequency for each SNP in each pool. For each of these obser vations, we initially used the pool peak areas A and B of the lower- and higher-mass alleles, respectively, to obtain the pool-based allele frequency estimate p = A/(A + B). As an alternative, we adjusted the estimate to take into account the unequal peak area of the two alleles in heterozygotes. To do so, we calculated the sample mean k of the ratios a/b, where a and b represent peak areas of the lower- and higher-mass alleles for an individual; we calculated k over all measurements on the indiv iduals

34

heterozygous for the SNP. The resulting allele frequency estimate for each of the up to 16 pool-based obser vations was p = A/(A + kB) (7). For either of these estimation methods, we then calculated the overall allele frequency estimate as the average of the up to 16 obser vation-specific estimates. For the 25 (8.6%) of 291 SNPs without data on individual heterozygotes, we used p.

To test for allele frequency differences between cases and controls based on our pooled results, we estimated the difference in allele frequencies between case and control pools, and compared this difference to its standard error by using the statistic T = (p1 - p2)/[Var(p1 - p2)]1/2. Here, pi is the mean estimated allele frequency in group i (1 = case, 2 = control) and Var represents variance.

To estimate Var(p1 - p2), we note that this variance ref lects the combined effects of population sampling and measurement error caused by carr ying out allele frequency estimation on pools, or Var(p1 - p2) = asampling

2 + ameasurement2. We estimated

the sampling variance by ssampling2 = p12(1 - p12)/[1/(2n1) +

1/(2n2)], where p12 = (n1p1 + n2p2)/(n1 + n2) is the weighted average of the case and control allele frequency estimates and ni is the number of individuals in pool i.

We modeled the measurement error caused by allele frequency estimation based on pools as ameasurement

2 = apcr2 +

aspot2. Here, apcr

2 and aspot2 are variances caused by PCR and

primer extension, and sample dispensing and mass spectrometr y analysis, respectively. We estimated apcr

2 and aspot2 for each SNP

with a mixed effects analysis of variance by using the MIXED procedure in SAS (SAS Institute, Car y, NC). In this analysis, allele frequency estimate was the response variable, indicators for each pool were included as fixed effects, and PCR was included as a random effect nested within pool. By specifying this model, we implicitly assume the absence of variability caused by pool construction. Because we did not construct multiple pools for each sample, we could not estimate this variability directly. Subsequent data analysis suggests this variability is modest and that assuming its absence has not significantly adversely affected our test (see Results).

Given npcr,i PCRs and nspot,i spots for group i = 1, 2 (in the absence of missing data, npcr,i = 4 and nspot,i = 16), replicate measurements result in an overall variance estimate of Var(p1 - p2) = ssampling

2 + spcr2 (1/npcr,1 + 1/npcr,2) + sspot

2 (1/nspot,1 + 1/nspot,2).

We estimated the false positive rate and power to detect significant allele frequency differences between pools by computer simulation. Each simulated pool contained 200 or 500 individuals, had control allele frequencies of 0.10, 0.50, or 0.80, and had case-control allele frequency differences of 0.00, 0.05, 0.07, or 0.10. For each replicate, we simulated obser vations for case and control pools with four PCRs per pool and four spots per PCR and for a single heterozygote with one PCR and four spots per PCR. For each set of simulation replicates, the heterozygous individuals were assigned a mean k value of 1.00, 1.29, 1.50, 2.40, or 4.00, and a SD for k of 0.11, as we obser ved in our data. We assumed PCR and spot variability were absent (corresponding to individual genotyping) or were equal to their estimated values of 1.18 X 10-4 and 3.82 X 10-4, respectively, as obser ved in our data.

Results To assess whether the SNP genotyping method of primer extension–mass spectrometr y was sufficiently accurate and precise to detect modest allele frequency differences between pools, we tested 16 SNPs with individual genotypes previously determined as part of our diabetes research project. These SNPs were selected to have a range of minor allele frequencies and frequency differences between cases of type 2 diabetes, unaffected spouse controls, and unaffected elderly controls. The frequency differences of 0.001– 0.108 are modest but ref lect our intention

Fig. 1. Sample spectra and frequency estimates based on the peak area. Frequency estimates of the C allele are 0.424 and 0.355 in the control and case pools, respectively, showing a difference of 0.069. Given the C allele frequency is overestimated in the heterozygote as 0.570, allele frequencies in the pools can be adjusted to 0.357 and 0.293, respectively. True frequencies of the C allele based on genotyping of the individuals comprising the pools are 0.377 and 0.313 for the controls and cases, respectively, so the estimate of allele frequency difference from the pool analysis is very accurate. In practice, we estimate pooled allele frequencies and the heterozygote ratio from multiple replicate observations, rather than from the single observations used here for purposes of illustration.

to use pooling to scan for association in complex diseases, where allele frequency differences are not expected to be dramatic. The 16 assays were not individually optimized, although they were chosen from a set of assays that had been successfully typed on >94% of individuals comprising the pools. We tested each DNA pool with quadruplicate PCR and extension reactions, each of which we dispensed and scanned four times for a total of up to 16 frequency estimates per SNP-pool combination. Over the course of our initial studies, we obser ved that increased peak intensity and signal-to-noise ratio decreased SDs between replicates (data not shown); for this analysis, we dispensed sample twice onto each spot before scanning. Example spectra are shown in Fig. 1. We obser ved unequal allele intensity in heterozygous individuals, a characteristic that has been described (25) and that we have obser ved for individual heterozygous samples with most of the hundreds of SNPs that we have typed on individual samples.

We calculated allele frequency estimates both with (p) and without (p) adjustment for unequal peak heights in heterozygotes, and compared the accuracy with which these two pool- based methods estimated allele frequencies. The average heterozygote ratio k = a/b for the 16 SNPs was 1.19 + 0.18, whereas the average SD of k was 0.12 + 0.05. The absolute average difference between pool-based and individual-based allele frequency estimates was 0.033 + 0.021 (range 0.001– 0.083) for p and 0.014 + 0.010 (range 0.000 – 0.037) for p, suggesting that adjustment resulted in more accurate allele frequency estimates. We use the heterozygote-adjusted allele frequency data in what follows unless otherwise noted. Table 1 shows the minor allele frequency estimates for 16 SNPs in three pools as well as the corresponding estimates obtained from individual genotypes. The average allele frequency SD we obser ved for up to 16 replicate values from 48 SNP-pool combinations was 0.021 + 0.011, and the maximum SDs were 0.073, 0.049, and 0.035.

We compared the SD from the 16 SNPs to a larger number of SNPs that were not typed on the individuals comprising the pools. For the 291 additional SNPs that met our criteria for analysis (see Methods), we obser ved an average SD from the 873

35

Table 1. Frequencies of SNPs as estimated by genotyping DNA pools and individual samples

Cases (F1) Spouses (SP) Elderly controls (EC) Prediction error

SNP Indiv Pool Indiv Pool Indiv Pool F1–SP F1–EC SP–EC

GLUT10_14 0.035 0.028 + 0.009 0.036 0.026 + 0.009 0.026 0.015 + 0.010 0.003 0.005 0.001 GLUT10_1 0.057 0.046 + 0.011 0.063 0.043 + 0.018 0.078 0.060 + 0.012 0.009 0.007 0.001 SNP63 0.118 0.125 + 0.013 0.120 0.125 + 0.011 0.115 0.119 + 0.010 0.002 0.004 0.002 PPARg2 0.145 0.182 + 0.016 0.194 0.230 + 0.017 0.224 0.252 + 0.023 0.001 0.008 0.007 ss146316 0.146 0.128 + 0.019 0.135 0.103 + 0.022 0.095 0.067 + 0.020 0.014 0.010 0.004 ss121557 0.156 0.140 + 0.014 0.141 0.115 + 0.009 0.115 0.089 + 0.015 0.009 0.009 0.000 ss146317 0.176 0.165 + 0.024 0.146 0.144 + 0.012 0.130 0.118 + 0.022 0.009 0.001 0.010 ss93115 0.236 0.251 + 0.012 0.251 0.249 + 0.021 0.312 0.317 + 0.032 0.017 0.009 0.008 SNP43 0.257 0.286 + 0.032 0.259 0.274 + 0.049 0.246 0.246 + 0.073 0.014 0.028 0.015 ss64248 0.309 0.298 + 0.023 0.316 0.311 + 0.024 0.312 0.317 + 0.027 0.007 0.016 0.010 ss1304220 0.313 0.318 + 0.021 0.379 0.399 + 0.020 0.377 0.395 + 0.021 0.015 0.013 0.002 ss121556 0.382 0.381 + 0.026 0.409 0.405 + 0.021 0.462 0.448 + 0.021 0.003 0.013 0.010 ss148393 0.429 0.428 + 0.010 0.392 0.389 + 0.012 0.348 0.352 + 0.013 0.002 0.005 0.007 ss86782 0.433 0.423 + 0.034 0.442 0.459 + 0.028 0.443 0.429 + 0.027 0.027 0.004 0.031 SNP56 0.438 0.456 + 0.016 0.428 0.446 + 0.016 0.415 0.437 + 0.021 0.000 0.004 0.004 ss86876 0.486 0.488 + 0.035 0.428 0.404 + 0.029 0.378 0.361 + 0.028 0.026 0.019 0.007

F1, cases of type 2 diabetes; SP, unaffected spouses; EC, elderly nondiabetic controls; Indiv, individuals. Frequencies for pools are mean + SD. Prediction error is the absolute difference of the frequency estimates based on pools compared to individual genotypes.

SNP-pool combinations of 0.018 + 0.008. The average heterozygote ratio k in the sample of 266 of 291 SNPs with at least one heterozygous individual was 1.29 + 0.39, whereas the average SD of k was 0.11 + 0.07.

For the 16 SNPs, we compared the estimated allele frequency differences based on case and control pools to frequency differences estimated from genotyping individuals comprising the pools (Table 1, Fig. 2). The mean absolute error in estimating the allele frequency difference between pools calculated from 48 SNP-pool comparisons was 0.009 + 0.008, and the maximum absolute errors were 0.031, 0.028, and 0.027. The mean absolute error was unchanged (0.009 + 0.008) when the allele frequencies were not adjusted for the heterozygote ratio.

We combined the data from the 16 SNPs to estimate the sources of experimental variability and to compare the experimental variability to the sampling variability associated with selecting individuals from the population. The estimated mea-surement variance caused by PCR or primer extension (spcr

2 = 1.18 X 10-4) is smaller than that caused by sample dispensing and mass spectrometr y analysis (sspot

2 = 3.82 X 10-4). For a pool

Fig. 2. Comparison of allele frequency difference estimated from pools to the frequency difference determined from individual genotypes. Each point represents one comparison between F1 and SP, F1 and EC, or SP and EC for 1 of the 16 SNPs. The lines represent the expected result + 0.03.

with n = 500 and allele frequency of 0.50, the summed measurement variances of (1.18 + 3.82) X 10-4 = 5.00 X 10-4 are larger than the sampling variability of (0.50)(0.50)/[2 (500)] = 2.5 X 10-4, but replicate PCRs and spots allow us to reduce the measurement variability substantially. For example, when npcr = 4 and nspot = 16 (4 PCRs X 4 spots per PCR), measurement variability is reduced to (1.18/4 + 3.82/16) X 10-4 = 0.53 X 10-4. Sampling variability of allele frequency estimates is an unavoidable consequence of a finite pool size.

Under the conser vative assumption that the 291 additional SNPs would be expected to show no association with diabetes, they provide an opportunity to assess empirically the false positive rate associated with our pool-based test statistic T. Based on the 266 SNPs with at least one typed heterozygous individual, we have 2 X 266 = 532 case-control comparisons and so would expect 532 X 0.05 = 26.6 comparisons significant at the 0.05 level. When basing our test on p (adjusting for the heterozygote ratio k), we obser ved 24 (4.5%) comparisons significant at the 0.05 level. When we omitted adjustment for k and used p, we obser ved 26 (4.9%) significant comparisons, 22 of which were also obser ved in the significance test based on p.

We estimated by computer simulation the power to detect case-control allele frequency differences of 0.05, 0.07, and 0.10 by using samples of 200 and 500 cases and controls given either individual genotyping or genotyping of pools (Table 2). Our calculations for pools assume four PCRs per pool and four spots per PCR, and that apcr

2 and aspot2 are equal to their mean values

estimated for the 16 SNPs. Our results suggest only modest decreases in power for pool-based analyses compared with individual-based analyses. For example, the power to detect a 0.07 allele frequency difference between cases and controls at a 0.50 control allele frequency was 82% given genotyping of two pools with 16 replicates each and 87% given genotyping of 500 X 2 = 1,000 individuals.

Discussion Primer extension analysis by mass spectrometr y successfully estimates allele frequency differences between DNA pools with sufficient accuracy and precision to be used as a screening step in large-scale association studies. To test a large number of SNPs on pools, automated assay design, standard assay conditions, and automated data collection are critical. We sought to develop

36

PCR spot

Table 2. Power (%) of pools and individually typed samples to detect 0.05– 0.10 allele frequency differences in cases and controls at significance level 0.05

Case-control difference

Control allele frequency = 0.10 Control allele frequency = 0.50

n Method 0.05 0.07 0.10 0.05 0.07 0.10

200 Pool 48 75 94 28 48 78 Individual 55 81 97 32 52 80

500 Pool 78 95 100 55 81 97 Individual 92 99 100 61 87 98

Power was estimated by computer simulation assuming k = 1.29, a 2 = 1.18 X 10-4

and four PCRs and four spots per PCR for each pool replicate. and a 2 = 3.82 X 10 -4,

standard methods and quality control criteria that would enable us to screen SNPs accurately and quickly.

Compared with an association study based on genotypes of individuals, a pooled DNA association study offers advantages and disadvantages. The primar y advantages are the reduced reagent and labor costs and time required to generate fewer genotypes. In addition, less DNA per sample is used per genotype when the sample is included in a DNA pool. In our laborator y, DNA pooling offers an =32-fold savings in reagent cost and an =16-fold savings in labor compared with our higher-throughput method for typing individual samples. Be- cause pooling must result in some loss of information, including loss of haplotype information, either a larger sample or a less significant detection threshold is required to achieve power comparable to that for genotyping individuals (Table 2).

Our current high-throughout pooling analysis follows a three- step design. First, we test SNP assays without replication on a crudely quantitated DNA pool to confirm that the assay design succeeds and that the SNP minor allele frequency is >0.05. This practice limits the use of our valuable carefully quantitated pools to successful SNP assays. Although some SNP assays fail under standard conditions, we prefer to develop quality control criteria to discard SNPs rather than spend time adjusting assay conditions, because our purpose is high-throughput screening. Sec- ond, we genotype each successful SNP on case and control pools and 7–11 individuals. For each pool, we carr y out 16 replicate genotypes. We discard SNP assays if we detect any evidence of an artifact (see Methods) or if the SD of the 16 replicates is >0.05. Third, we follow up SNPs identified as interesting by this pooling technique by genotyping individual samples to verify allele frequency differences and to allow haplotype analysis and genotype-based phenotypic comparisons.

In comparison to other SNP genotyping methods for screening DNA pools for association, primer extension–mass spectrometr y is reasonably precise. The average allele frequency SD of 0.018 – 0.021 we report is similar to the 0.021 reported for kinetic PCR (11), slightly greater than the 0.014 reported for primer extension-denaturing high-performance liquid chromatography (9), the 0.009 – 0.017 reported for f luorescent nucleotide primer extension-capillar y electrophoresis (14, 18), and the 0.011 reported for pyrosequencing (17), and less than the 0.038 reported for bioluminometric-primer extension (13). Further, mass spectrometr y offers advantages in the potential for automation over several of these other methods.

For the 266 SNPs with at least one genotyped heterozygote, we obser ved significant results from both adjusted (4.5%) and unadjusted (4.9%) pool allele frequencies that were consistent with the expected false positive rates under the null hypothesis of no association, 5%. These results, although limited, suggest that our test is not particularly anticonser vative, despite our decision to ignore variability owing to pool construction. Our simulations suggested that adjusting for k, even based on just a

single heterozygote, was adequate to preser ve the expected false positive rates. In the absence of adjustment for k, our simulations showed that the tests were either conser vative or anticonser vative, depending on the underlying allele frequency. This finding, especially in light of the value of individuals in quality control assessment, suggests that typing of a limited number of individuals is a useful component for pooling studies.

To assess the potential of mass spectrometr y to screen for allele frequency differences between pools efficiently, we assessed the sources of variability in our approach. Experimental variability originates during pool construction, PCR, primer extension, product dispensing onto a chip, and mass spectrometer data collection. During pool construction, variability can arise if DNA concentrations are incorrect or pipetting is inaccurate. During PCR, variability may arise from unequal allele amplification given additional SNP(s) under the primer(s), simultaneous amplification of two SNPs, inaccurate pipetting of template DNA or reagents between wells, unequal PCR conditions between wells, and sample contamination. During primer extension, variability may be caused by differential incorporation of nucleotides and allele pausing, in which the primer for the two-nucleotide extension incorporates only the deoxynucleotide without addition of the final dideoxynucleotide. During product dispensing and mass spectrometer data collection, variability can arise because of incorporated baseline noise, especially at low peak intensity, decay of detection sensitivity with increasing mass, and inconsistent desorption and ionization.

Based on our study of 16 SNPs, we estimated variances of 1.18 X 10-4 caused by PCR or primer extension and 3.82 X 10-4

caused by product dispensing and data collection. To reduce this measurement variability, we performed four replicate PCR and primer extension reactions and dispensed each product with four replicates for mass spectrometr y analysis. Depending on the desired level of accuracy, more or fewer of either replicate type could be undertaken. The appropriate replicate number depends on numbers of individuals in each pool. Carr ying out many replicates to reduce experimental variability will have little practical value if sampling variability is much greater than experimental variability.

Because we only constructed each DNA pool once, we could not directly estimate the variance caused by pool construction. The fact that ignoring this variability did not appear to result in a strongly anti-conser vative test suggests that, at least for our pools, this variability probably is small. This assumption could be tested directly by the construction of multiple pool replicates, but at the expense of considerable time and effort.

Determining the optimum number of pools for a given case or control sample, whether replicates or smaller pools, should also take into account the theoretical limit on the maximum number of individual DNA templates that can be assayed from any one pool. Given samples of 20 ng of pooled DNA and =13.4 picograms per diploid genome, chromosomes from a maximum

37

of =1,500 individuals (20,000/13.4) can be represented once as template for PCR. If <20 ng of DNA is used, even fewer samples could be measured in pools. Samples of >1,000 case and control individuals would be desirable for complex disease association studies because of the decreased variability caused by sampling from the population. To effectively test a ver y large sample, the individual DNAs could be combined into several pools with fewer individuals or additional PCRs could be performed.

In conclusion, we have determined that primer extension analysis by mass spectrometr y, with appropriate replication, is sufficiently accurate and precise to allow comparison of allele frequency differences between DNA pools. For studies that aim to compare genotypes in hundreds or thousands of case and controls, this approach offers fast, reliable screening of a candidate region with savings of labor, DNA, and reagent costs compared with genotyping individuals. With the expected development of a haplotype map of the human genome (4), yielding

a set of 200,000 –300,000 ‘‘gold standard’’ SNPs that allow whole genome association studies to become a reality, the pooling approach may allow large-scale analysis of the genetics of common disease at acceptable genotyping costs.

We gratefully acknowledge the other members of the FUSION collaboration for making this study possible and Andi Braun and Christy Johnston of Sequenom, Inc., for intellectual contributions and construction of the case and control DNA pools. The FUSION study is made possible by intramural funds from the National Human Genome Re- search Institute (Project No. OH95-C-N030) and by National Institutes of Health Grant HG00376 (to M.B.). This project was supported by a Cooperative Research and Development Agreement between the Na- tional Human Genome Research Institute and Sequenom, Inc. K.L.M. is the recipient of a Burroughs Wellcome Career Award in the Biomed- ical Sciences, K.S. was partially supported by a grant from The Academy of Finland, and T.E.F. was supported by National Institutes of Health Training Grant HG00040.

1. Risch, N. & Merikangas, K. (1996) Science 273, 1516 –1517. 2. Horikawa, Y., Oda, N., Cox, N. J., Li, X., Orho-Melander, M., Hara, M.,

Hinokio, Y., Lindner, T. H., Mashima, H., Schwarz, P. E., et al. (2000) Nat. Genet. 26, 163–175.

3. Hugot, J. P., Chamaillard, M., Zouali, H., Lesage, S., Cezard, J. P., Belaiche, J., Almer, S., Tysk, C., O’Morain, C. A., Gassull, M., et al. (2001) Nature 411, 599 – 603.

4. Judson, R., Salisbur y, B., Schneider, J., Windemuth, A. & Stephens, J. C. (2002) Pharmacogenomics 3, 379 –391.

5. Arnheim, N., Strange, C. & Erlich, H. (1985) Proc. Natl. Acad. Sci. USA 82, 6970 – 6974.

6. Breen, G., Harold, D., Ralston, S., Shaw, D. & St. Clair, D. (2000) BioTech- niques 28, 464 – 470.

7. Hoogendoorn, B., Norton, N., Kirov, G., Williams, N., Hamshere, M. L., Spurlock, G., Austin, J., Stephens, M. K., Buckland, P. R., Owen, M. J. & O’Donovan, M. C. (2000) Hum. Genet. 107, 488 – 493.

8. Wolford, J. K., Blunt, D., Ballecer, C. & Prochazka, M. (2000) Hum. Genet. 107, 483– 487.

9. Giordano, M., Mellai, M., Hoogendoorn, B. & Momigliano-Richiardi, P. (2001) J. Biochem. Biophys. Methods 47, 101–110.

10. Kosaki, K., Yoshihashi, H., Ohashi, Y., Kosaki, R., Suzuki, T. & Matsuo, N. (2001) J. Biochem. Biophys. Methods 47, 111–119.

11. Germer, S., Holland, M. J. & Higuchi, R. (2000) Genome Res. 10, 258 –266. 12. Sasaki, T., Tahira, T., Suzuki, A., Higasa, K., Kukita, Y., Baba, S. & Hayashi,

K. (2001) Am. J. Hum. Genet. 68, 214 –218. 13. Zhou, G., Kamahori, M., Okano, K., Chuan, G., Harada, K. & Kambara, H.

(2001) Nucleic Acids Res. 29, E93.

14. Matyas, G., Giunta, C., Steinmann, B., Hossle, J. P. & Hellwig, R. (2002) Hum. Mutat. 19, 58 – 68.

15. Gruber, J. D., Colligan, P. B. & Wolford, J. K. (2002) Hum. Genet. 110, 395– 401.

16. Neve, B., Froguel, P., Corset, L., Vaillant, E., Vatin, V. & Boutin, P. (2002) BioTechniques 32, 1138 –1142.

17. Wasson, J., Skolnick, G., Love-Gregor y, L. & Permutt, M. A. (2002) BioTech- niques 32, 1144 –1152.

18. Norton, N., Williams, N. M., Williams, H. J., Spurlock, G., Kirov, G., Morris, D. W., Hoogendoorn, B., Owen, M. J. & O’Donovan, M. C. (2002) Hum. Genet. 110, 471– 478.

19. Ross, P., Hall, L. & Haff, L. A. (2000) BioTechniques 29, 620 – 629. 20. Buetow, K. H., Edmonson, M., MacDonald, R., Clifford, R., Yip, P., Kelley, J.,

Little, D. P., Strausberg, R., Koester, H., Cantor, C. R. & Braun, A. (2001) Proc. Natl. Acad. Sci. USA 98, 581–584.

21. Werner, M., Sych, M., Herbon, N., Illig, T., Konig, I. R. & Wjst, M. (2002) Hum. Mutat. 20, 57– 64.

22. Valle, T., Tuomilehto, J., Bergman, R. N., Ghosh, S., Hauser, E. R., Eriksson, J., Nylund, S. J., Kohtamaki, K., Toivanen, L., Vidgren, G., et al. (1998) Diabetes Care 21, 949 –958.

23. Douglas, J. A., Erdos, M. R., Watanabe, R. M., Braun, A., Johnston, C. L., Oeth, P., Mohlke, K. L., Valle, T. T., Ehnholm, C., Buchanan, T. A., et al. (2001) Diabetes 50, 886 – 890.

24. Little, D. P., Braun, A., Darnhofer-Demar, B., Frilling, A., Li, Y., McIver, R. T., Jr., & Koster, H. (1997) J. Mol. Med. 75, 745–750.

25. Sun, X., Ding, H., Hung, K. & Guo, B. (2000) Nucleic Acids Res. 28, E68. 26. Bray, M. S., Boerwinkle, E. & Doris, P. A. (2001) Hum. Mutat. 17, 296 –304.

38

High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools Supporting information for Mohlke et al. (2002) Proc. Natl. Acad. Sci. USA, 10.1073/pnas.262661399

Table 3. Primer sequences for single nucleotide polymorphisms genotyped on pools and individuals

Primer Sequence

ss121556_FOR AGCGGATAACGACGCCATCAGGCTCTTTAG

ss121556_REV AGCGGATAACAATTTCACACAGGAGATGGGACTCCCTGATCCT

ss121556_EXT GGCTCTTTAGGGAGAAGTCT

ss121557_FOR AGCGGATAACACATGGCATGCTGGAAAAGG

ss121557_REV AGCGGATAACAATTTCACACAGGTAAAAATCCTCCGGGCTCTG

ss121557_EXT TGGAAAAGGAAAAACTAGAGAGGC

ss146317_FOR AGCGGATAACTACACTGGCAGTCACTTCTG

ss146317_REV AGCGGATAACAATTTCACACAGGTCTTGCTCTAAGGAGGGATG

ss146317_EXT CTTCTCCGATCACCTTCAATAA

ss148393_FOR AGCGGATAACAAGATGTGATCTAGGGCCTC

ss148393_REV AGCGGATAACAATTTCACACAGGCCATTCCCTAAACACACTTG

ss148393_EXT GGGAAGTCAAGCAAACCAAGTACA

ss64248_FOR AGCGGATAACGTGCATAAGAATCACCAGGG

ss64248_REV AGCGGATAACAATTTCACACAGGGCCTGTTAGAAGTGAGGATC

ss64248_EXT CACCAGGGGAATTTTTTCACA

ss86876_FOR AGCGGATAACGAAACGAAATGGCACACAGG

ss86876_REV AGCGGATAACAATTTCACACAGGCACTTTGAGAAGGGTGAGTG

ss86876_EXT CACACAGGGCACCGATCC

ss93115_FOR AGCGGATAACATGTGCAGACACCAGAGAGC

ss93115_REV AGCGGATAACAATTTCACACAGGATTGTCTTGTCCCTTCCCGC

ss93115_EXT CATGGATGTGGAGGGACAC

GLUT10_1_FOR AGCGGATAACCCTCATCCCACTCCAGGG

GLUT10_1_REV AGCGGATAACAATTTCACACAGGAGGAGTACCGTGGCCTCC

GLUT10_1_EXT CCACTCCAGGGAGGTGAG

GLUT10_14_FOR AGCGGATAACGCTGATATTTCTCAGGATCC

GLUT10_14_REV AGCGGATAACAATTTCACACAGGTGGGCCGAAGAACAAAACAG

GLUT10_14_EXT GAATGTAAACTCTTCCCCT

PPARg2_FOR gctgttatgggtgaaactctg

PPARg2_REV agcggataacaatttcacacaggcagtgtatcagtgaaggaatcg

PPARg2_EXT tctgggagattctcctattgac

SNP43_FOR CTGTGTGTGGGCAGAGGAC

39

SNP43_REV AGCGGATAACAATTTCACACAGGCCTCATCCTCACCAAGTCAAG

SNP43_EXT CGCTTGCTGCGAAGTAAGGC

SNP56_FOR CAAGGGTGGTGTCCTCAGTT

SNP56_REV agcggataacaatttcacacaggCCTCGCACTAGTGAAAGGA

SNP56_EXT CAGTTTGTGACCTTCCCCT

SNP63_FOR agcggataacCCTGAAGGTTCCACTCTCCA

SNP63_REV agcggataacaatttcacacaggCTCCCTGGTCACTGGATGTT

SNP63_EXT GACGCGGCCCACCCCCTC

ss1304220_FOR AGCGGATAACATGAGGGTGGGAGGTGCAAC

ss1304220_REV AGCGGATAACAATTTCACACAGGTGAAGCAGGAAGCCTTGCAG

ss1304220_EXT GTGCAACCCCCTTGATGAGGC

ss146316_FOR AGCGGATAACCACGCTAGAATCATGTGTCC

ss146316_REV AGCGGATAACAATTTCACACAGGTCCTCTCTACTGTCTCCTTC

ss146316_EXT TCATGTGTCCAAGGGCTCAC

ss86782_FOR AGCGGATAACAGCCACTTGAACTTCTCGAG

ss86782_REV AGCGGATAACAATTTCACACAGGTAAGCTTCCTGCCTTGCTAG

ss86782_EXT TTTCTTGAGCTTAGCTTCAGG

FOR, forward PCR primer; REV, reverse PCR primer; EXT, extendable primer. Gene-specific portions of PCR primers are underlined.

40

Chapter 3

A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants

Science2007;316(5829):1341-5

41

clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

here.following the guidelines

can be obtained byPermission to republish or repurpose articles or portions of articles

): May 28, 2011 www.sciencemag.org (this infomation is current as of

The following resources related to this article are available online at

http://www.sciencemag.org/content/316/5829/1341.full.htmlversion of this article at:

including high-resolution figures, can be found in the onlineUpdated information and services,

http://www.sciencemag.org/content/suppl/2007/04/25/1142382.DC1.html can be found at: Supporting Online Material

http://www.sciencemag.org/content/316/5829/1341.full.html#ref-list-1, 3 of which can be accessed free:cites 3 articlesThis article

680 article(s) on the ISI Web of Sciencecited by This article has been

http://www.sciencemag.org/content/316/5829/1341.full.html#related-urls100 articles hosted by HighWire Press; see:cited by This article has been

http://www.sciencemag.org/cgi/collection/geneticsGenetics

subject collections:This article appears in the following

43

G. Brice,6 B. Bullman,7 J. Campbell,8 B. Castle,9 R. Cetnarsyj,8 C.Chapman,10 C. Chu,11 N. Coates,12 T. Cole,10 R. Davidson,4

A. Donaldson,13 H. Dorkins,3 F. Douglas,2 D. Eccles,9 R. Eeles,1

F. Elmslie,6 D. G. Evans,7 S. Goff,6 S. Goodman,5 D. Goudie,2

J. Gray,15 L. Greenhalgh,16 H. Gregory,17 S. V. Hodgson,6

T. Homfray,6 R. S. Houlston,1 L. Izatt,18 L. Jackson,18

L. Jeffers,19 V. Johnson-Roffey,12 F. Kavalier,18 C. Kirk,19

F. Lalloo,7 C. Langman,18 I. Locke,1 M. Longmuir,4 J. Mackay,20

A. Magee,19 S. Mansour,6 Z. Miedzybrodzka,17 J. Miller,11

P. Morrison,19 V. Murday,4 J. Paterson,21 G. Pichert,18

M. Porteous,8 N. Rahman,6 M. Rogers,15 S. Rowe,22 S. Shanley,1

A. Saggar,6 G. Scott,2 L. Side,23 L. Snadden,4 M. Steel,2 M. Thomas,5

S. Thomas,11Clinical Genetics Service, Royal Marsden Hospital, DownsRoad, Sutton, Surrey, SM2 5PT, UK. 2Department ofClinical Genetics, Ninewells Hospital, Dundee, DD1 9SY,UK. 3Medical and Community Genetics, Kennedy-GaltonCentre, Level 8V, Northwick Park and St. Mark’s NHS Trust,Watford Rd, Harrow, HA1 3UJ, UK. 4Institute of MedicalGenetics, Yorkhill NHS Trust, Dalnair Street, Glasgow, G38SJ, UK. 5Clinical Genetics Department, Royal Devon andExeter Hospital (Heavitree), Gladstone Road, Exeter, EX12ED, UK. 6Department of Clinical Genetics, St. George’s

Hospital Medical School, Jenner Wing, Cranmer Terrace,London, SW17 0RE, UK. 7Department of Medical Genetics,St. Mary’s Hospital, Hathersage Road, Manchester, M130JH, UK. 8South East of Scotland Clinical Genetics Service,Western General Hospital, Crewe Road, Edinburgh, EH42XU, UK. 9Department of Medical Genetics, The PrincessAnne Hospital, Coxford Road, Southampton, S016 5YA, UK.10Clinical Genetics Unit, Birmingham Women’s Hospital,Metchley Park Road, Edgbaston, Birmingham, B15 2TG,UK. 11Yorkshire Regional Genetic Service, Department ofClinical Genetics, Cancer Genetics Building, St. JamesUniversity Hospital, Beckett Street, Leeds, LS9 7TF, UK.12Department of Clinical Genetics, Leicester Royal Infirm-ary, Leicester, LE1 5WW, UK. 13Department of ClinicalGenetics, St Michael’s Hospital, Southwell Street, Bristol,BS2 8EG, UK. 14Institute of Human Genetics, InternationalCentre for Life, Central Parkway, Newcastle upon Tyne, NE13BZ, UK. 15Institute of Medical Genetics, UniversityHospital of Wales, Heath Park, Cardiff, CF14 4XW, UK.16Department of Clinical Genetics, Alder Hey Children’sHospital, Eaton Road, Liverpool L12 2AP, UK. 17ClinicalGenetics Centre, Argyll House, Foresterhill, Aberdeen,AB25 2ZR, UK. 18Clinical Genetics, 7th Floor New Guy’s

House, Guy’s Hospital, St. Thomas Street, London, SE1 9RT,UK. 19Clinical Genetics Service, Belfast City Hospital Trust,Belvoir Park Hospital, Lisburn Road, Belfast, BT9 7AB, UK.20Clinical and Medical Genetics Unit, Institute of ChildHealth, 30 Guildford Street, London, WC1N 1EH, UK.21Department of Clinical Genetics, Addenbrooke’s NHSTrust, Box 134, Hills Road, Cambridge, CB2 2QQ, UK.22Department of Clinical Genetics, Moston Lodge, Countessof Chester Hospital, Liverpool Road, Chester, CH2 1UL, UK.23Department of Clinical Genetics, Churchill Hospital, OldRoad, Headington, Oxford OX3 7LJ, UK.

Supporting Online Materialwww.sciencemag.org/cgi/content/full/1142364/DC1Materials and MethodsFigs. S1 to S8Tables S1 to S10References

9 March 2007; accepted 20 April 2007Published online 26 April 2007;10.1126/science.1142364Include this information when citing this paper.

A Genome-Wide Association Study ofType 2 Diabetes in Finns DetectsMultiple Susceptibility VariantsLaura J. Scott,1 Karen L. Mohlke,2 Lori L. Bonnycastle,3 Cristen J. Willer,1 Yun Li,1William L. Duren,1 Michael R. Erdos,3 Heather M. Stringham,1 Peter S. Chines,3Anne U. Jackson,1 Ludmila Prokunina-Olsson,3 Chia-Jen Ding,1 Amy J. Swift,3 Narisu Narisu,3Tianle Hu,1 Randall Pruim,4 Rui Xiao,1 Xiao-Yi Li,1 Karen N. Conneely,1 Nancy L. Riebow,3Andrew G. Sprau,3 Maurine Tong,3 Peggy P. White,1 Kurt N. Hetrick,5 Michael W. Barnhart,5Craig W. Bark,5 Janet L. Goldstein,5 Lee Watkins,5 Fang Xiang,1 Jouko Saramies,6Thomas A. Buchanan,7 Richard M. Watanabe,8,9 Timo T. Valle,10 Leena Kinnunen,10,11Gonçalo R. Abecasis,1 Elizabeth W. Pugh,5 Kimberly F. Doheny,5 Richard N. Bergman,9Jaakko Tuomilehto,10,11,12 Francis S. Collins,3* Michael Boehnke1*

Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans hasbeen a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 millionautosomal SNPs. We carried out association analysis with these SNPs to identify genetic variantsthat predispose to T2D, compared our T2D association results with the results of two similar studies,and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contributeto the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and theregion of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO,PPARG, and KCNJ11 are associated with T2D risk. This brings the number of T2D loci now confidentlyidentified to at least 10.

Type 2 diabetes (T2D) is a disease charac-terized by insulin resistance and impairedpancreatic beta-cell function that affects

>170 million people worldwide (1). With first-degree relatives having ~3.5 times as much riskas compared to individuals in the general middle-aged population (2), hereditary factors, togetherwith lifestyle and behavioral factors, play animportant role in determining T2D risk (3). Todate, intense efforts to identify genetic risk factorsin T2D have met with only limited success. Thisstudy, reports from our collaborators (4–6), andthe recently published work of Sladek et al. (7)describe results of genome-wide association

(GWA) studies that further define the geneticarchitecture of T2D and identify biological path-ways involved in T2D pathogenesis.

We genotyped 1161 Finnish T2D cases and1174 Finnish NGTcontrols on 317,503 SNPs onthe Illumina HumanHap300 BeadChip in stage1 of a two-stage GWA study of T2D (8). Thesesamples are from the Finland–United States In-vestigation of Non–Insulin-Dependent DiabetesMellitus Genetics (FUSION) (9, 10) and Finrisk2002 (11) studies (tables S1 and S2A). Among the317,503 GWA SNPs, 315,635 had ≥10 copies ofthe less common allele [minor allele frequency(MAF) > 0.002] and passed quality-control crite-

ria (8). We tested these 315,635 SNPs for asso-ciationwith T2D using amodel that is additive onthe log-odds scale (Table 1 and tables S3 and S4)(8). We observed a modest excess (41 observedversus 31.6 expected; P = 0.19) of SNPs withP values < 10−4 (fig. S1). These results argueagainst the existence of multiple common SNPswith a large impact on T2D disease risk but areconsistent with the presence of multiple commonSNPs that each confer modest risk. The resultsalso suggest that the matching of cases and con-trols by birth province, sex, and age (8) has beensuccessful; in support of this conclusion, thegenomic control (12) correction value is 1.026.

Analysis of our Illumina HumanHap300 dataallowed us to query much of the known SNPvariation in the genome. To increase this pro-portion, we developed an imputation method(8, 13) that uses genotype data and linkage dis-equilibrium (LD) information from the HapMapCentre d’Etude du Polymorphisme Humain(Utah residents with ancestry from northern and

1Department of Biostatistics and Center for StatisticalGenetics, University of Michigan, Ann Arbor, MI 48109,USA. 2Department of Genetics, University of NorthCarolina, Chapel Hill, NC 27599, USA. 3Genome Technol-ogy Branch, National Human Genome Research Institute,Bethesda, MD 20892, USA. 4Department of Mathematicsand Statistics, Calvin College, Grand Rapids, MI 49546,USA. 5Center for Inherited Disease Research (CIDR),Institute of Genetic Medicine, Johns Hopkins School ofMedicine, Baltimore, MD 21224, USA. 6Savitaipale HealthCenter, 54800 Savitaipale, Finland. 7Division of Endocri-nology, Keck School of Medicine, University of SouthernCalifornia, Los Angeles, CA 90033, USA. 8Department ofPreventive Medicine, Keck School of Medicine, Universityof Southern California, Los Angeles, CA 90089, USA.9Department of Physiology and Biophysics, Keck School ofMedicine, University of Southern California, Los Angeles,CA 90033, USA. 10Diabetes Unit, Department of Epide-miology and Health Promotion, National Public HealthInstitute, 00300 Helsinki, Finland. 11Department of PublicHealth, University of Helsinki, 00014 Helsinki, Finland.12South Ostrobothnia Central Hospital, 60220 Seinäjoki,Finland.

*To whom correspondence should be addressed. E-mail:[email protected] (M.B.); [email protected] (F.S.C.)

44

Table1.

Confirm

edT2Dsusceptib

ility

locibasedon

allavailabledata

from

theFU

SION,DGI,andWTCCC

/UKT2D

samples.

Position

Risk

allele

/no

nrisk

FUSION

Stage1+2

control

risk

allele

FUSION

stage1

FUSION

stage2

FUSION

stage1+2

DGIA

llDa

taWTCCC

/UKT2D

AllD

ata

FUSION

-DGI-

WTCCC

/UKT2D

AllD

ata

Total

sample

sizefor

80%

powe

r**

FUSION

Chr

(bp)

Genes

allele

freq.

OR(95%

CI)

POR

(95%

CI)

POR

(95%

CI)

POR

(95%

CI)

POR

(95%

CI)

POR

(95%

CI)

PNe

wT2DLoci

rs44

0296

03

186,99

4,38

9IGF2BP

2T/G

0.30

1.28

(1.13–

1.45

)1.2×

10–4

1.08

(0.96–

1.22

)0.22

1.18

(1.08–

1.28

)2.1×10

–4

1.17

(1.11–

1.23

)1.7×

10–9

1.11

(1.05–

1.16

)1.6×

10–4

1.14

(1.11–

1.18

)8.9×

10–16

~43

00

rs77

5484

0*6

20,769

,229

CDKA

L1C/G

0.36

1.16

(1.02–

1.30

)0.02

11.08

(0.96–

1.22

)0.20

1.12

(1.03–

1.22

)0.00

951.08

(1.03–

1.14

)2.4×

10–3

1.16

(1.10–

1.22

)1.3×

10–8

1.12

(1.08–

1.16

)4.1×

10–11

~53

00

rs10

8116

619

22,124

,094

CDKN

2A/B

T/C

0.85

1.17

(0.98–

1.39

)0.08

21.22

(1.04–

1.44

)0.01

51.20

(1.07–

1.36

)0.00

221.20

(1.12–

1.28

)5.4×

10–8

1.19

(1.11–

1.28

)4.9×

10–7

1.20

(1.14–

1.25

)7.8×

10–15

~39

00

rs93

0003

9†11

41,871

,942

C/A

0.89

1.52

(1.24–

1.87

)6.0×

10–5

1.45

(1.19–

1.77

)2.7×

10–4

1.48

(1.28–

1.71

)5.7×

10–8

1.16

¶

(0.95–

1.42

)0.12

1.13

#

(0.99–

1.29

)0.06

81.25

(1.15–

1.37

)4.3×

10–7

~34

00

rs80

5013

616

52,373

,776

FTO

A/C

0.38

1.03

(0.92–

1.16

)0.58

1.18

(1.05–

1.33

)0.00

631.11

(1.02–

1.20

)0.01

61.03

¶

(0.91–

1.17

)0.25

1.23

(1.18–

1.32

)7.3×

10–14

1.17

(1.12–

1.22

)1.3×

10–12

~27

00

Previously

publish

edT2Dassociation

rs18

0128

23

12,368

,125

PPAR

GC/G

0.82

1.30

(1.11–

1.53

)0.00

111.08

(0.93–

1.26

)0.33

1.20

(1.07–

1.33

)0.00

141.09

(1.01–

1.16

)0.01

91.23

#

(1.09–

1.41

)0.00

131.14

(1.08–

1.20

)1.7×

10–6

~64

00

rs13

2666

348

118,25

3,96

4SLC3

0A8

C/T

0.61

1.22

(1.08–

1.38

)0.00

101.14

(1.02–

1.28

)0.02

61.18

(1.09–

1.29

)7.0×

10–5

1.07

(1.0–1

.16)

0.04

71.12

(1.05–

1.18

)7.0×

10–5

1.12

(1.07–

1.16

)5.3×

10–8

~51

00

rs11

1187

5‡10

94,452

,862

HHEX

C/T

0.52

1.13

(1.01–

1.27

)0.03

91.06

(0.94–

1.19

)0.34

1.10

(1.01–

1.19

)0.02

61.14

(1.06–

1.22

)1.7×

10–4

1.13

(1.07–

1.19

)4.6×

10–6

1.13

(1.09–

1.17

)5.7×

10–10~42

00

rs79

0314

6§10

114,74

8,33

9TCF7L2

T/C

0.18

1.39

(1.20–

1.61

)1.2×

10–5

1.30

(1.12–

1.50

)3.5×

10–4

1.34

(1.21–

1.49

)1.3×

10–8

1.38

(1.31–

1.46

)2.3×

10–31

1.37

#

(1.25–

1.49

)6.7×

10–13

1.37

(1.31–

1.43

)1.0×

10–48

~10

00

rs52

19||

1117

,366

,148

KCNJ11

T/C

0.46

1.20

(1.07–

1.36

)0.00

221.04

(0.92–

1.16

)0.55

1.11

(1.02–

1.21

)0.01

31.15

(1.09–

1.21

)1.0×

10–7

1.15

#

(1.05–

1.25

)0.00

131.14

(1.10–

1.19

)6.7×

10–11

~37

00

Totalsamplesize

2,33

52,47

34,80

813

,781

13,965

32,544

Num

berof

cases/controls

1,16

1/1,17

41,21

5/1,25

82,37

6/2,43

26,52

9/7,25

25,68

1/8,28

414

,586

/17,96

8

*rs109

4639

8WTCCC

/UKT2D

(r2=

1).

†Multim

arkertagforrs93

0003

9DGIandrs15

1482

3WTCCC

/UKT2D

(r2=

0.96

5).

‡rs501

5480

WTCCC

GWAonly

(r2=

1).

§rs790

1695

WTCCC

/UKT2D

(r2=

0.84

9).

||rs521

5WTCCC

/UKT2D

(r2=

0.99

5).

¶DGIG

WAsamples.

#WTCCC

GWAsamples.

**Ap

proximatetotalsam

plesize

for80

%power

todetectT2DSN

Passociationat

significancelevel0

.05isbasedon

theFU

SIONcontrolriskallelefrequencyandtheriskratio

calculated

from

FUSION-DGI-W

TCCC

/UKT2D

all-d

ataanalyses,assuming0.10

T2Dprevalence.Thesamplesizesvary

slightly

from

thoseof

(4)becausestudy-specificallele

frequencieswereused

inthecalculations.

45

western Europe) (CEU) samples to predictgenotypes of autosomal SNPs not genotyped inour subjects. A total of 2.09 million HapMapCEU SNPs (14) had imputed MAF >1% inFUSION and passed our imputation quality-control criteria. In the HapMap CEU sample,imputed SNPs passing these criteria increasedcoverage of SNPs with MAF >1% from 71.9 to89.1% at an r2 threshold of 0.8.

To increase the statistical power to detect T2Dpredisposing variants, we compared our stage1 results to GWA results from the DiabetesGenetics Initiative (DGI) and theWellcome TrustCase Control Consortium (WTCCC). Weselected 82 SNPs for FUSION stage 2 follow-up genotyping based on evidence from: (i)FUSION-genotyped and FUSION-imputedSNPs; (ii) a combined analysis of GWA resultsfrom FUSION, DGI, and WTCCC; and (iii)previous T2D association results. For (i) and (ii),we used a prioritization algorithm that advan-taged SNPs based on genome annotation (8)(table S7) and gave preference to genotypedSNPs over nearby imputed SNPs. We success-fully genotyped 80 of the 82 SNPs in our stage 2sample of 1215 Finnish T2D cases and 1258

Finnish NGTcontrols (8) (table S2B) and carriedout joint analysis of the combined FUSIONstage 1 + 2 sample (table S5). DGI (4) andUnited Kingdom T2D Genetics Consortium(UKT2D) (5) investigators also followed upDGI and WTCCC GWAs by genotyping rep-lication samples.

We confirmed well-established T2D asso-ciations with TCF7L2, PPARG, and KCNJ11(Table 1) (15–18). SNPs in TCF7L2 reachedgenome-wide significance in the FUSION stage1 + 2 sample [odds ratio (OR) = 1.34, P = 1.3 ×10−8] and in the FUSION-DGI-WTCCC/UKT2D “all-data” (i.e., all GWA and follow-upsamples) meta-analysis (OR = 1.37, P = 1.0 ×10−48) (Table 1 and table S5).PPARGPro12→Ala12

(rs1801282) and KCNJ11 Glu23→Lys23 (rs5219)were not genotyped in the FUSION GWA, butnearby SNPs showed some evidence for T2Dassociation, as did the imputed genotypes for thecoding variants. All-data meta-analysis resultedin genome-wide significant T2D associationwith KCNJ11 Glu23→Lys23 (OR = 1.14, P =6.7 × 10−11) and strong evidence for PPARGPro12→Ala12 (OR = 1.14, P = 1.7 × 10−6). ThePPARG and KCNJ11 results emphasize the value

of combining data across studies and suggest thatother T2D-associated loci remain to be found.

The combined samples from the three studiesprovide evidence for seven additional T2D loci.For the first three of these loci, we had strongevidence in the FUSION stage 1 GWA data and,for the latter four, our FUSION stage 1 evidencewas more modest.

A cluster of variants in the IGF2BP2 (insulin-like growth factor 2 mRNA binding protein 2)region was associated with T2D in our stage1 sample (e.g., rs1470579 with OR = 1.27, P =1.6 × 10−4) (Fig. 1A). The all-data meta-analysisfor rs4402960 resulted in genome-wide signifi-cance (OR = 1.14, P= 8.9 × 10−16). Including thers4402960 genotype as a covariate essentiallyeliminates evidence for T2D association for othervariants in the cluster (Fig. 1A), which isconsistent with all SNPs representing the sameT2D-predisposing variant(s). IGF2BP2 is aparalog of IGF2BP1, which binds to the 5′untranslated region of the insulin-like growthfactor 2 (IGF2) mRNA and regulates IGF2translation (19). IGF2 is a member of the insulinfamily of polypeptide growth factors involved inthe development, growth, and stimulation of

Fig. 1. Plots of T2D association and LD in FUSION stage 1 samples forregions surrounding IGF2BP2 (A) and rs9300039 (B). (A) and (B) each containsix panels. The top panels display RefSeq genes; there are none in thers9300039 region. The second panels (i.e., directly below the top panels) showthe T2D association –log10 P values in FUSION stage 1 samples for SNPsgenotyped in the GWA panel (closed blue circles) or imputed (open bluecircles). The third panels show T2D association –log10 P values for each SNP ina logistic regression model correcting for the reference SNP [indicated by thered circle for rs4402960 in (A) and for rs9300039 in (B)]. SNP rs7480010,

reported by Sladek et al. (7), is also labeled in the rs9300039 plot (B) (greencircle). A decrease in the –log10 P value from the second to the third panelsindicates that the association signal of the tested SNPs can be explained, atleast in part, by the reference SNP. In both regions, the reference SNP waschosen for convenience; the choice of another strongly associated SNP nearbywould have resulted in a similar picture. The fourth panels show recombinationrate in centimorgans per megabase for the HapMap CEU sample (14). The fifthand sixth panels show LD r2 and D' based on FUSION stage 1–genotyped andFUSION stage 1–imputed data.

46

insulin action. The most strongly associatedIGF2BP2 SNPs are located in a 50-kb regionwithin intron 2 (Fig. 1A); diabetes-predisposingvariants may therefore affect regulation ofIGF2BP2 expression.

SNP rs13266634, a nonsynonymousArg325→Trp325 variant in the pancreatic beta-cell–specific zinc transporter SLC30A8 (20),showed (through our annotation-based algorithm)evidence for T2D association in stage 1 (Table 1and fig. S2). Modest evidence in stage 2 resultedin stronger evidence in our stage 1 + 2 sample(OR = 1.18, P = 7.0 × 10−5) (Table 1 and tableS5). Subsequent DGI and UKT2D genotypingresulted in strong evidence in the combined sam-ples (OR = 1.12, P= 5.3 × 10−8). Sladek et al. (7)recently reported independent T2D associationevidence with the same allele in two Frenchsamples (P = 1.8 × 10−5 and P = 5.0 × 10−7).SLC30A8 transports zinc from the cytoplasminto insulin secretory vesicles (20, 21), whereinsulin is stored as a hexamer bound with twoZn2+ ions before secretion (22). Variation inSLC30A8may affect zinc accumulation in insulingranules, affecting insulin stability, storage, orsecretion. In high-glucose conditions, overex-pression of SLC30A8 in insulinoma (INS-1E)

cells enhanced glucose-induced insulin secretion(21).

SNP rs9300039 in an intergenic region onchromosome 11 showed evidence for T2D asso-ciation in stage 1 (Table 1 and Fig. 1B); geno-typing our stage 2 sample resulted in neargenome-wide significance in our stage 1 + 2sample (OR = 1.48, P = 5.7 × 10−8) (Table 1 andtables S3 and S5). In the WTCCC and DGIscans, the nearby SNP rs1514823 (r2 = 0.97 withrs9300039) provided weak evidence for T2Dassociation with the appropriate allele; com-bining results across all three studies gave OR =1.25 and P = 4.3 × 10−7. Fifty-six imputed SNPsand two more genotyped SNPs spanning 219 kbare in LD with rs9300039 and show substantialevidence for T2D association (P < 10−4) in ourstage 1 sample (table S3 and Fig. 1B). Includingthe genotype for rs9300039 as a covariate es-sentially eliminates evidence for T2D associationwith the remaining SNPs (Fig. 1B). This regionincludes three sets of spliced ExpressedSequence Tags but no annotated genes. Theidentification of a T2D-associated variant >1 Mbfrom the nearest annotated gene highlights thevalue of a genome-wide approach. Sladek et al.(7) reported strongly associated SNPs in twonearby regions on chromosome 11. SNPrs7480010 near hypothetical gene LOC387761is 331 kb centromeric to rs9300039. LD betweenrs9300039 and rs7480010 is essentially zero(r2 = 0.00063 and D' = 0.036), and rs7480010showed little evidence for association in our stage1 + 2 sample (OR = 1.03, P = 0.54). Sladek et al.(7) also reported T2D association with threeintronic variants of EXT2, located ~2.4 Mbcentromeric of rs9300039; we found no evidencefor association with EXT2 SNPs.

SNP rs4712523, located within intron 5 ofCDKAL1, showed modest evidence for T2D as-sociation in our FUSION stage 1 sample, whichstrengthened slightly in our combined stage 1 + 2sample (OR = 1.12, P = 0.0073) (table S5).Nearby SNPs in strong LD with rs4712523including rs7754840 showed modest evidencefor T2D association in the DGI scan andconsiderably stronger evidence in the WTCCCscan. Including strong DGI and UKT2D repli-cation data resulted in genome-wide significance(OR = 1.12, P = 4.1 × 10−11 for rs7754840) in theall-datameta-analysis (Table 1). CDKAL1 [cyclin-dependent kinase 5 (CDK5) regulatory subunitassociated protein–1–like 1] shares protein do-main similarity with CDK5 regulatory subunit–associated protein 1 (CDK5RAP1), which spe-cifically inhibits activation of CDK5 by CDK5regulatory subunit 1 (CDK5R1) (23). Usingquantitative reverse transcription polymerasechain reaction analysis of a panel of RNAsamples from human tissues and cells, wedetected the highest expression of CDKAL1 inskeletal muscle and brain cells, as well as in 293TandHepG2 cells (fig. S3A). The associated SNPswithin intron 5, or SNPs in LD with them, mayregulate expression of CDKAL1 and so affect the

expression of CDK5. CDK5 and CDK5R1 ac-tivity is influenced by glucose and may influencebeta-cell processes (24, 25); overactivity ofCDK5 in the pancreas may lead to beta-cell de-generation, especially under glucotoxic condi-tions (26).

SNP rs10811661 near cyclin-dependent ki-nase inhibitors CDKN2A and CDKN2B showedmodest evidence for T2D association in ourstage 1 + 2 sample (OR = 1.20, P = 0.0022)(Table 1 and table S5) and showed genome-widesignificance in the all-data meta-analysis (OR =1.20, P = 7.8 × 10−15). SNP rs10811661 islocated upstream of CDKN2A and CDKN2B,may have a long-range effect on one of thesegenes, or may influence a gene not yet an-notated. CDKN2A and CDKN2B inhibit theactivity of CDK4 and CDK6. In mice, Cdk4activity has been shown to influence beta-cellproliferation and mass, with loss of Cdk4leading to diabetes (27, 28). We find CDKN2Ato be expressed at high levels in islets,adipocytes, brain, and pancreas and at evenhigher levels in 293T, HeLa, and HepG2 cells(fig. S3B); CDKN2B is expressed in islets andadipocytes and, to a lesser degree, in small intes-tine, colon, 293T, and HepG2 cells (fig. S3C).CDKN2A and CDKN2B are also tumor suppres-sor genes and may play a role in aging (29).

SNPs rs1111875 and rs7923837 showedmod-est evidence of T2D association in the FUSIONand DGI scans, much stronger evidence in theWTCCC scan, and genome-wide significant evi-dence (OR = 1.13,P= 5.7 × 10−10 for rs1111875)in the all-data meta-analysis. These SNPs are inLD (r2 = 0.70) in a region that includes HHEX(hematopoietically expressed homeobox),which is critical for development of the ventralpancreas (30), the insulin-degrading enzymegene IDE, and the kinesin-interacting factor 11gene KIF11. Sladek et al. (7) recently reportedindependent genome-wide significant evidencefor T2D association with these SNPs.

The WTCCC/UKT2D groups identified evi-dence for T2D and body mass index (BMI)associations with a set of SNPs includingrs8050136 in the FTO region; the T2D associa-tion appears to be mediated through a primaryeffect on adiposity (5, 6, 31). We observedmodest evidence for association with T2D inthe combined FUSION stage 1 + 2 sample (OR =1.11, P = 0.016) (Table 1 and table S5).

T2D can be a component of a larger syn-drome of metabolic abnormalities, and we wereinterested to assess the effects of T2D-relatedtraits on our association results. We repeated ourT2D association analysis for the 10 SNPs inTable 1 with one of several variables included asan additional covariate. Adjustment for BMIstrengthened T2D association with TCF7L2 andSLC30A8, weakened association with rs9300039and FTO, and had little effect on the other loci.The effect of waist circumference was similar tothat of BMI; blood pressure variables hadessentially no effect.

Fig. 2. Prediction of T2D risk in the FUSION samplewith the use of 10 T2D susceptibility variants. T2Dcases and NGT controls with complete genotype datawere included in the analysis. To obtain a sample witha T2D prevalence of ~10%, we included nine copiesof each of 2176NGT controls and one copy of each of2102 T2D cases. The predicted risk for each in-dividual was estimated from a logistic regressionmodel containing the 10 risk variants listed in Table1. The proportion of T2D cases is shown for 20 equalintervals of predicted T2D risk. We constructed 95%confidence intervals (CIs) for the proportion of T2Dcases in each interval using the original sample of2102 cases and 2176 controls. The constructed sam-ple T2D prevalence (0.096) is shown as a horizontalline. The proportion of T2D cases increases from~5% in the lowest to 20% in the highest predictedrisk categories.

47

We previously carried out T2D linkage anal-ysis in the families of many of our stage 1 cases(10). None of the 10 loci in Table 1 had largeT2D logarithm of the odds (LOD) scores,although those for FTO and TCF7L2 were 0.63and 0.60 and so were nominally significant.LOD scores for six of the 10 loci were greaterthan 0.2, as compared to 2.2 that would beexpected for random genome locations. Thissuggests enrichment for T2D-associated loci inregions with modest evidence of T2D linkage(P = 0.01) but that the power of the linkageapproach was insufficient to distinguish thesesignals from background noise.

The ability to construct a list of ten robustand replicated T2D-associated loci (Table 1)represents a landmark in efforts to identify ge-netic variants that predispose to complex humandiseases, although the specific predisposing var-iants and even the relevant genes remain to bedefined.We examined the combined risk of T2Dbased on these 10 loci in our stage 1 + 2 sampleby constructing a logistic regression model andpredicting T2D risk for each person (8). We founda fourfold variation in T2D risk from the lowest tohighest predicted risk groups, which is of potentialinterest for a personalized preventive-medicineprogram (Fig. 2). However, these predictions fromour datamay be biased as compared to predictionsbased on the general population, likely owing tothe overestimation of ORs due to the “winner’scurse,” enrichment for familial T2D cases, andexclusion of individuals with impaired glucosetolerance or impaired fasting glucose.

Thirty years ago, James V. Neel labeled T2Das “the geneticist’s nightmare” (32), predictingthat the discovery of genetic factors in T2Dwould be thoroughly challenging. Until recently,his prediction has proven true. Although largesamples and collaboration among three groupswere required, we can confidently state that newdiabetes risk factors have been identified. Eachgene discovery points to a pathway that contrib-utes to pathogenesis, and all of these proteins andtheir relevant pathways represent potential drugtargets for the prevention or treatment of diabetes.Based on the number of other interesting resultsobserved in these studies, it is likely that thereare additional T2D-predisposing loci to be found.Even though much remains to be done, we are atlast awakening from Jim Neel’s nightmare.

References and Notes1. S. Wild, G. Roglic, A. Green, R. Sicree, H. King, Diabetes

Care 27, 1047 (2004).2. S. S. Rich, Diabetes 39, 1315 (1990).3. J. Kaprio et al., Diabetologia 35, 1060 (1992).4. Diabetes Genetics Initiative, Science 316, 1331 (2007);

published online 26 April 2007 (10.1126/science.1142358).5. E. Zeggini et al., Science 316, 1336 (2007); published

online 26 April 2007 (10.1126/science.1142364).6. The Wellcome Trust Case Control Consortium, Nature,

in press.7. R. Sladek et al., Nature 445, 881 (2007).8. Materials and methods are available as supporting

material on Science Online.9. T. Valle et al., Diabetes Care 21, 949 (1998).

10. K. Silander et al., Diabetes 53, 821 (2004).

11. T. Saaristo et al., Diabetes Vasc. Dis. Res. 2, 67 (2005).12. B. Devlin, K. Roeder, Biometrics 55, 997 (1999).13. Y. Li, P. Scheet, J. Ding, G. R. Abecasis, submitted for

publication; manuscript available from G.R.A. (e-mail:[email protected]).

14. International HapMap Consortium, Nature 437, 1299 (2005).15. S. F. Grant et al., Nat. Genet. 38, 320 (2006).16. S. S. Deeb et al., Nat. Genet. 20, 284 (1998).17. D. Altshuler et al., Nat. Genet. 26, 76 (2000).18. A. L. Gloyn et al., Diabetes 52, 568 (2003).19. J. Nielsen et al., Mol. Cell. Biol. 19, 1262 (1999).20. F. Chimienti, S. Devergnas, A. Favier, M. Seve, Diabetes

53, 2330 (2004).21. F. Chimienti et al., J. Cell Sci. 119, 4199 (2006).22. M. F. Dunn, Biometals 18, 295 (2005).23. Y. P. Ching, A. S. Pang, W. H. Lam, R. Z. Qi, J. H. Wang,

J. Biol. Chem. 277, 15237 (2002).24. M. Ubeda, D. M. Kemp, J. F. Habener, Endocrinology

145, 3023 (2004).25. F. Y. Wei et al., Nat. Med. 11, 1104 (2005).26. M. Ubeda, J. M. Rukstalis, J. F. Habener, J. Biol. Chem.

281, 28858 (2006).27. S. G. Rane et al., Nat. Genet. 22, 44 (1999).28. T. Tsutsui et al., Mol. Cell. Biol. 19, 7011 (1999).29. W. Y. Kim, N. E. Sharpless, Cell 127, 265 (2006).30. R. Bort, J. P. Martinez-Barbera, R. S. Beddington,

K. S. Zaret, Development 131, 797 (2004).31. T. M. Frayling et al., Science 316, 889 (2007); published

online 12 April 2007 (10.1126/science.1141634).32. J. V. Neel, in The Genetics of Diabetes Mellitus,

W. Creutzfeldt, J. Köbberling, J. V. Neel, Eds. (Springer,Berlin, 1976), pp. 1–11.

33. We thank the Finnish citizens who generouslyparticipated in this study; our colleagues from the DGI,WTCCC, and UKT2D for sharing prepublication data fromtheir studies; S. Enloe of FUSION and E. Kwasnik,J. Gearhart, J. Romm, M. Zilka, C. Ongaco, A. Robinson,R. King, B. Craig, and E. Hsu of CIDR for expert technicalwork; and D. Leja of NHGRI for expert assistance with afigure. Support for this research was provided by NIHgrants DK062370 (M.B.), DK072193 (K.L.M.), HL084729(G.R.A.), HG002651 (G.R.A.), and U54 DA021519;National Human Genome Research Institute intramuralproject number 1 Z01 HG000024 (F.S.C.); a postdoctoralfellowship award from the American Diabetes Association(C.J.W.); a Wenner-Gren Fellowship (L.P.O.); and a CalvinResearch Fellowship (R.P.). Genome-wide genotyping wasperformed by the Johns Hopkins University GeneticResources Core Facility (GRCF) SNP Center at CIDR withsupport from CIDR NIH (contract N01-HG-65403) and theGRCF SNP Center.

Supporting Online Materialwww.sciencemag.org/cgi/content/full/1142382/DC1Author ContributionsMaterials and MethodsFigs. S1 to S3Tables S1 to S7References

12 March 2007; accepted 20 April 2007Published online 26 April 2007;10.1126/science.1142382Include this information when citing this paper.

Complex I Binding by a VirallyEncoded RNA RegulatesMitochondria-Induced Cell DeathMatthew B. Reeves,1* Andrew A. Davies,1 Brian P. McSharry,2Gavin W. Wilkinson,2 John H. Sinclair1†

Human cytomegalovirus infection perturbs multiple cellular processes that could promote therelease of proapoptotic stimuli. Consequently, it encodes mechanisms to prevent cell death duringinfection. Using rotenone, a potent inhibitor of the mitochondrial enzyme complex I (reducednicotinamide adenine dinucleotide– ubiquinone oxido-reductase), we found that humancytomegalovirus infection protected cells from rotenone-induced apoptosis, a protection mediatedby a 2.7-kilobase virally encoded RNA (b2.7). During infection, b2.7 RNA interacted with complex Iand prevented the relocalization of the essential subunit genes associated with retinoid/interferon–induced mortality–19, in response to apoptotic stimuli. This interaction, which is important forstabilizing the mitochondrial membrane potential, resulted in continued adenosine triphosphateproduction, which is critical for the successful completion of the virus’ life cycle. Complex Itargeting by a viral RNA represents a refined strategy to modulate the metabolic viability of theinfected host cell.

During primary infection or reactivation ofhuman cytomegalovirus (HCMV), espe-cially in the immunocompromised, the

virus is able to replicate in a number of cell types,often resulting in life-threatening disease (1).HCMVexhibits a relatively protracted life cycle(upwards of 5 days) and at early times of in-fection (12 to 24 hours) encodes a highly abun-dant 2.7-kb RNA transcript (b2.7), accountingfor >20% of total viral gene transcription (2, 3)of unknown function. The RNA may be asso-ciated with mitochondria (4), and no proteinproduct of this RNA has ever been detected in

infected cells (3), suggesting that it functions as anoncoding RNA (5).

We investigated the possibility that b2.7could function as a noncoding RNA. A

1Department of Medicine, University of Cambridge,Addenbrooke’s Hospital, Hills Road, Cambridge, CB22QQ, UK. 2Section for Infection and Immunity, Collegeof Medicine, University of Wales, Heath Park, Cardiff, CF144XX, UK.

*Present address: Novartis Institutes for Biomedical Research,500 Technology Square, Cambridge, MA 02139, USA.†To whom correspondence should be addressed. E-mail:[email protected]

48

Supporting Online Material for

A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants

Laura J. Scott, Karen L. Mohlke, Lori L. Bonnycastle, Cristen J. Willer, Yun Li, William L. Duren, Michael R. Erdos, Heather M. Stringham, Peter S. Chines, Anne U. Jackson, Ludmila Prokunina-Olsson, Chia-Jen Ding, Amy J. Swift, Narisu Narisu, Tianle Hu, Randall Pruim, Rui Xiao, Xiao-Yi Li, Karen N. Conneely, Nancy L. Riebow, Andrew G. Sprau, Maurine Tong, Peggy P. White, Kurt N. Hetrick, Michael W. Barnhart, Craig W. Bark, Janet L. Goldstein, Lee Watkins, Fang Xiang, Jouko Saramies, Thomas A. Buchanan, Richard M. Watanabe, Timo T. Valle, Leena Kinnunen, Gonçalo R. Abecasis, Elizabeth W. Pugh, Kimberly F. Doheny, Richard N. Bergman, Jaakko Tuomilehto, Francis S. Collins,* Michael Boehnke*

*To whom correspondence should be addressed. E-mail: [email protected] (M.B.); [email protected] (F.S.C.)

Published 26 April 2007 on Science Express

49

Methods

Sample description

Stage 1: In the results reported here, we analyzed 1,161 T2D cases and 1,174 NGT controls

from the Finland-United States Investigation of NIDDM Genetics (FUSION) (1, 2) and Finrisk

2002 (3) studies as our stage 1 sample (Tables S1, S2A). T2D was defined according to 1999

World Health Organization (WHO) criteria (4) of fasting plasma glucose concentration 7.0

mmol/l or 2-h plasma glucose concentration 11.1 mmol/l, by report of diabetes medication use,

or based on medical record review. FUSION cases with known or probable type 1 diabetes

among their first degree relatives were excluded. Normal glucose tolerance (NGT) was defined

as having fasting glucose < 6.1 mmol/l and 2-h glucose < 7.8 mmol/l (4). The 789 FUSION

cases each reported at least one T2D sibling; the 372 Finrisk 2002 T2D cases came from a

Finnish population-based risk factor survey. Controls included 219 subjects from Vantaa,

Finland who were NGT at ages 65 and 70 years, 304 NGT spouses of FUSION subjects, and 651

Finrisk 2002 NGT subjects. The stage 1 controls were approximately frequency-matched to the

stage 1 cases by five-year age category, sex, and birth province. We refer to these FUSION and

Finrisk 2002 cases and controls in the text as the FUSION stage 1 sample. For quantitative trait

and quality control analyses, we genotyped 122 FUSION offspring, yielding 119 mother-father-

offspring trios, 1 mother-father-two-offspring quartet, and one parent-offspring pair. For quality

control, we successfully genotyped 79 duplicate samples and five CEU HapMap parent-child

trios.

Stage 2: 1,215 Finnish T2D cases and 1,258 Finnish NGT controls were selected for stage 2

from the Dehko 2D (D2D) (5), Health 2000 (6), Finrisk 1987 (7), Finrisk 2002 (3), Savitaipale

50

Diabetes (8), and Action LADA (9) studies (Tables S1, S2B) and classified according to WHO

1999 criteria (4). The D2D, Health 2000, Finrisk 1987, and Savitaipale Diabetes studies are

population-based surveys; Action LADA is a study of latent autoimmune diabetes in adults

(LADA) in recently-diagnosed diabetes patients. We chose T2D cases from Action LADA who

were GAD antibody negative and therefore unlikely to have LADA. For all studies except

Action LADA, NGT controls were approximately frequency-matched within each study to the

T2D cases by five-year age category, sex, and birth province. Action LADA cases were

approximately frequency-matched in the same way with additional controls from the other

studies. Our stage 2 sample consists of 327 cases and 399 controls from D2D, 127 cases and 224

controls from Health 2000, 266 cases and 397 controls from Finrisk 1987, 52 controls from

Finrisk 2002, 122 cases and 186 controls from Savitaipale, and 373 cases from Action LADA

(Table S2B). For quality control in stage 2, we successfully genotyped 56 duplicate samples.

Informed consent: Informed consent was obtained from each study participant, and the study

protocol was approved by the ethics committee or institutional review board in each of the

participating centers.

Genotyping

GWA genotyping: Stage 1 and quality control samples were genotyped on Illumina Infinium™

II HumanHap300 BeadChips v.1.0 in the Johns Hopkins University Genetic Resources Core

Facility (GRCF) SNP Center at the Center for Inherited Disease Research (CIDR) using the

Illumina Infinium II assay protocol (10). An in-house LIMS was used for sample and reagent

51

tracking and lab workflow control (11). ~1 g of genomic DNA (15 μL at 70 ng/ l) was used as

input for the Infinium II assay.

Intensity data for each sample were normalized using BeadStudio v.2.3.25 and, for quality

control within CIDR, genotypes were determined using the Illumina-provided standard definition

cluster-file for the HumanHap300 v.1.0 product. These cluster boundaries were determined by

Illumina using 111 unique HapMap samples: 47 CEU, 36 YRI, and 28 CHB/JPT. BeadStudio

sample sheets were generated from our in-house LIMS. Sample and batch level quality control

was done by monitoring sample call rates, sex, heterozygote frequencies, and lab workflow

related variables using data generated from BeadStudio and our LIMS. 35 genotyped samples

fell below our sample call rate threshold of < 97.5% and were repeated; 28 of the repeated

samples gave call rates > 97.5%. The remaining 7 samples were excluded from analyses.

To obtain genotypes for analysis, we re-clustered the genotype data using cluster boundaries

determined with our own data. We removed samples for 15 people identified as likely first or

second degree relatives of other sampled individuals based on their genotype data (12). We

checked for consistency in genotyping within each of 79 duplicate sample pairs, with Mendelian

inheritance among the 122 parent-offspring sets, and with Hardy-Weinberg Equilibrium (HWE)

using the unrelated individuals (13). After initial analyses, we manually reviewed in BeadStudio

the clustering of the genotype data for our most strongly associated SNPs.

SNPs were dropped from all analyses if the HWE p-value was < 10-6, the total number of

Mendelian inconsistencies and duplicate pair discrepancies was > 3, or the SNP call rate was <

52

90%; and flagged for further attention if the HWE p-value was < 10-4, the total number of

Mendelian inconsistencies and duplicate pair discrepancies was > 1, or the SNP call rate < 95%.

All genotypes were oriented to the forward strand. There is little risk of strand ambiguities as

there are no C/G or A/T polymorphisms included in the Illumina 300K HumanHap panel.

For the 315,635 SNPs that passed our quality control criteria, the genotype consistency rate

among 79 duplicate sample pairs was 99.996%, the Mendelian consistency rate in 122 parent-

child sets was 99.967%, and the concordance rate for 15 samples genotyped both in our study

and by the HapMap consortium was 99.82%. 80.8% of SNPs had call frequency of 100%, and

99.68% of SNPs had call frequencies > 95%.

Confirmation and replication genotyping: We carried out focused, lower-throughput genotyping

with the Sequenom Homogeneous MassEXTEND or iPLEX Gold SBE assays at the National

Human Genome Research Institute (NHGRI). For 26 GWA SNPs re-genotyped in the stage 1

samples on a different genotyping platform (Sequenom), we observed a genotype consistency

rate of 99.92%; these included the SNPs with the strongest evidence of T2D association. We

also genotyped SNPs in the FUSION stage 2 samples or in the combined FUSION stage 1+2

samples to follow up interesting results based on (a) FUSION genotyped and imputed SNPs; (b)

the FUSION-DGI-WTCCC GWA results comparison; and (c) prior T2D association results in

our own or other studies. 80 of the 82 attempted SNPs had genotype call frequency > 94% and

HWE p-value > .001. The genotype consistency rate among duplicate samples was 99.9% and

the average call frequency was 97.1%.

53

Statistical analysis

T2D association: We tested for T2D-SNP association using logistic regression under the

additive genetic model that is multiplicative on the OR scale with adjustment for five-year age

category, sex, and birthplace. This test is the logistic regression equivalent to the Cochran-

Armitage test for trend (14) and is hence robust to departures from Hardy-Weinberg equilibrium.

We repeated some analyses including BMI, waist, systolic blood pressure, or diastolic blood

pressure as an additional covariate to assess the impact of these variables on evidence for SNP-

T2D association. For X-chromosome markers, we treated hemizygous males as homozygotes,

consistent with X inactivation for most of the chromosome. We presented and followed up on

results based on this additive model for ease of comparison between groups. We also analyzed

SNPs using recessive and dominant models; no SNP reached genome-wide significance in

FUSION stage 1 data, although additional T2D-prediposing variants may be among the SNPs

identified by these models.

To evaluate empirically the distribution of p-values observed in our GWA stage 1 study, we

permuted case/control status and re-ran the entire GWA analysis 100 times. We counted the

number of p-values < 10-5 or < 10-4 within each permuted dataset and found our study to fall

within the permuted distribution.

Statistical significance: Following the recommendation of the International HapMap

Consortium based on analysis of the ENCODE data, we declared a T2D-SNP association

“genome-wide significant” if the nominal p-value for the SNP was < 5 x 10-8 (15). In so doing,

54

we dealt with the multiple comparisons problem suggested by carrying out the equivalent of ~1

million tests.

Sample size calculation: For each SNP in Table 1, we calculated the sample size necessary to

detect T2D-SNP association at significance level .05 and power 80% under an additive model.

We converted the FUSION-DGI-WTCCC/UKT2D all-data OR to a risk ratio assuming T2D

prevalence 10%, and used this risk ratio and FUSION stage 1+2 control risk allele frequency as

the population allele frequency in the sample size calculation (16).

Imputation: We applied a computationally efficient hidden Markov model based algorithm (17,

18) to impute genotypes in FUSION samples for 2.25 million autosomal SNPs genotyped by the

International HapMap Consortium (15), but not present on the Illumina HumanHap300

BeadChip. The method combines our FUSION Illumina GWA genotype data with phased

chromosomes for the HapMap CEU samples and then infers the unknown FUSION genotypes

probabilistically by searching for similar stretches of flanking haplotype in the HapMap CEU

reference sample. In this process, we used the genotype data from the 290,690 FUSION

Illumina GWA autosomal SNPs which passed our quality control criteria and had minor allele

frequency > 5%. For each individual at each imputed SNP, we calculated an average allele

dosage score based on 90 iterations of the imputation algorithm. We assessed the quality of the

results for each SNP by calculating (a) the proportion of iterations that agreed with the most

likely genotype (imputation consistency) and (b) the ratio of the observed variance of dosage

scores across samples to the expected variance given the imputed allele frequency of the SNP

55

(estimated r2). 2.15 million of the HapMap autosomal SNPs had minor allele frequency > 1% in

the CEU sample; of these, 2.09 million met our quality control criterion of an estimated r2 > .30.

We evaluated the accuracy of our imputation procedure by comparing imputed genotypes to

actual genotypes for 510 SNPs not present on the Illumina GWA panel but that we had

previously genotyped in 1,190 individuals in our stage 1 samples (19). The average concordance

rate between imputed and actual alleles (genotypes) was 98.5% (97.1%), suggesting that the

HapMap CEU sample provides an appropriate basis for SNP genotype imputation in Finns,

consistent with our previous findings that allele frequencies, haplotype frequencies, and linkage

disequilibrium (LD) measures are remarkably similar between the CEU samples and a set of the

Finnish individuals that overlaps with those included in this study (19). We also genotyped 23

SNPs imputed in our stage 1 data; 16 of these SNPs had stage 1 imputation-based p-values < 10-

5. For most of these SNPs, the p-values for the actual genotypes were very similar to those for

the imputed genotypes, although often slightly less significant (Table S6); large differences

occurred most often for estimated r2 values nearer the quality control threshold. Differences

reflect variability in the imputation-based p-value estimates and our choice to follow up strong

imputation-based association results, an example of the “winner’s curse.” This variability in p-

value estimates for imputed SNPs did not lead to an increased overall false positive rate for the

study since we have chosen to genotype each such SNP in stage 1 as well as stage 2.

To test for disease-SNP association for imputed SNPs allowing for the effects of covariates, we

used logistic regression models in which the SNP effect was represented by its mean imputed

56

allele dosage score, an approach that takes into account the degree of uncertainty of genotype

imputation (18).

Combined analysis: We used a fixed effects model to estimate the combined ORs, 95%

confidence intervals (CIs), and p-values for the GWA genotype or imputed data for FUSION and

the GWA genotype data from DGI and WTCCC studies (20). We used the same approach to

combine all available data from the FUSION, DGI, and WTCCC/UKT2D studies. All results are

based on genotypes predicted from the forward strand of the genome sequence. When we

describe results across studies for non-identical SNPs, we report LD estimates based on FUSION

genotype data when available and on imputed data when not.

SNP selection for stage 2 genotyping: We selected SNPs for genotyping in the FUSION stage 2

samples based on the results of the FUSION GWA and the comparison of the FUSION, DGI,

and WTCCC GWA results. To enrich for SNPs with interesting biological functions from the

FUSION GWA, we weighted the association p-value according to our interest in the SNP based

on genome annotation, using an algorithm similar to the one described by Roeder et al. (21), with

weights as described in Table S7. Our algorithm advantaged genotyped SNPs that tagged any

HapMap SNP annotated as non-synonymous, frameshift, or critical splice site variants, or

located in or around interesting T2D candidate genes using an LD threshold of r2 .8 in the CEU

HapMap sample. It did so by dividing the p-value by the product of the maximal relevant

weighting factor and the relevant bonus factors. For imputed SNPs, we assigned the weight

based only on the imputed SNP itself. From SNPs with weighted p-values 10-4, we formed

sets of SNPs within 100 kb of each other and ranked these sets based on the smallest weighted p-

57

value. From each of these sets, we selected a strongly associated SNP for stage 2 genotyping,

giving some preference to genotyped over imputed SNPs to reduce stage 1 genotyping

requirements and to focus on SNPs for which we had more accurate genotype information. If an

imputed SNP was chosen, we genotyped stage 1 and 2 samples.

Risk prediction: We predicted T2D risk in the FUSION sample based on the ten identified T2D

susceptibility variants listed in Table 1. T2D cases and NGT controls with complete genotype

data were included in the analysis. To obtain a sample with ~10% T2D prevalence, the 2,176

NGT controls were included nine times each and the 2,102 T2D cases once each in a logistic

regression analysis. Figure 2 displays the proportion of T2D individuals for twenty equal

intervals of predicted T2D risk. 95% CIs for the proportion of T2D cases were constructed using

the original, not the expanded, sample.

Linkage and association: To assess the possible predictive value of T2D linkage for T2D

association, we counted the number of our ten T2D-associated loci (Table 1) for which the T2D

linkage LOD score was > 0.2 in our FUSION affected sibling pair families (2). We then divided

the genome into 5 cM bins and noted that 22% of such bins had T2D LOD score > 0.2 in our

T2D linkage scan. The observed count of six of the ten loci with T2D LOD > 0.2 is ~3-times

greater than expected by chance, and has exact binomial p-value of .01, consistent with the

hypothesis that very modest linkage evidence is somewhat predictive of the presence of a locus

detectable by association methods.

Gene expression analysis

58

RNAs from human tissues were purchased from Clontech and represented pooled samples from

several individuals. Purified human pancreatic islets were obtained from Islet Cell Resource

Centers (IRB Exemption number 3072) and the National Disease Research Interchange (IRB

Exemption number 3269) with approval by the National Institutes of Health Office of Human

Subjects Research. Anonymous human blood donor samples from the NIH Clinical Center

Division of Transfusion Medicine were provided as buffy coat isolations from whole blood

centrifugation. Human adipocytes were purchased from Cambrex as differentiated cultures, and

cell cultures -- 293T (human embryonic kidney), HeLa (human cervical carcinoma), and HepG2

(human hepatocellular carcinoma) -- were purchased from ATCC (the American Type Culture

Collection). Lymphoblastoid cell lines from CEPH individuals were purchased from the Coriell

Cell Repositories. RNA from cell cultures, islets, blood, and adipocytes was prepared with

Trizol Reagent (Invitrogen) followed by RNeasy Kit (Qiagen). RNA from four individual

samples was used to prepare pooled cDNA for islets, adipocytes, blood, and lymphoblasts.

cDNA was prepared from 1 ug of total RNA, using SuperScript III reverse transcriptase and

random hexamers (Invitrogen). cDNA equivalent to 25-50 ng of total RNA was used for each

quantitative PCR. All PCRs were performed in 10 ul volume in replicates of 3 or 4 using the

7900 Real-Time PCR System (ABI) in 384 well plates; average values were used for

calculations. The PCR with 2xSYBR Green PCR mix (Qiagen) and specific primers was

designed over exon boundaries to amplify only from cDNA:

CDKAL1_f: GAAGAATCTTTTGATTCCAAGTTTT

CDKAL1_r: GCAGCACCATTCTGGAACTC

CDKN2A_f: ATCTATGCGGGCATGGTTACT

59

CDKN2A_r: CAACGCACCGAATAGTTACG

CDKN2B_f: CGGGGACTAGTGGAGAAGGT

CDKN2B_r: ACCAGCGTGTCCAGGAAG

PCRs were carried out for 15 min at 95 C, followed by 40 cycles of 15 sec at 95 C, 15 sec at 59

C, and 45 sec at 72 C. Post-PCR melting curve analysis was used after each run. Gel-purified

PCR fragments were also sequenced to ensure the specificity of amplification and splicing. An

expression assay for human beta-2 microglobulin (B2M) Hs00187842_m1 was purchased from

ABI and used according to the instructions. Ct values (cycle at threshold) were determined from

real-time PCR. The expression of target genes was normalized to expression of B2M according

to the equation dCt = Ct B2M - Ct target, compared to expression in pancreas by equation ddCt =

dCt tissue - dCt pancreas, then converted to fold difference as fold difference = 2 ddCt (ABI, User

Bulletin #2 on relative quantification). We were unable to assess confidently the tissue

distribution of IGF2BP2 mRNA because of very high similarity (> 95%) to three processed

pseudogenes on chromosomes 1, 8, and 12.

60

Supplementary Figure Legends

Figure S1. Quantile-quantile plot for T2D association -log10 p-values for FUSION stage 1

samples and p-values expected under the null distribution for FUSION GWA SNPs.

Figure S2. Plot of T2D association and LD in FUSION stage 1 sample for region surrounding

SLC30A8. The top panel contains RefSeq genes. The second panel shows the T2D association

-log10 p-values in FUSION stage 1 samples for SNPs genotyped in the GWA panel (•) or

imputed (o). The third panel shows T2D association -log10 p-values for each SNP in a logistic

regression model correcting for the reference SNP rs13266634 (•, red dot). A decrease in the -

log10 p-value from the second to the third panel indicates that the association signal of the tested

SNPs can be explained, at least in part, by the reference SNP. The reference SNP is a non-

synonymous coding SNP, and was chosen because of its potential of being the actual functional

variant responsible for the association signal; choice of another strongly associated SNP nearby

would have resulted in a similar picture. The fourth panel shows recombination rate in cM per

Mb for the HapMap CEU sample (15). The fifth and sixth panels show linkage disequilibrium r2

and D' based on FUSION stage 1 genotyped and imputed data.

Figure S3. Expression of CDKAL1 (first panel), CDKN2A (second panel), and CDKN2B (third

panel) in human tissues and cells. The level of expression of each gene was determined by

quantitative RT-PCR, and normalized to the beta-2-microglobulin (B2M) housekeeping gene.

The data are presented as fold difference relative to expression in pancreas, which is set at 1.0.

61

293T cells are human embryonic kidney, HeLa are human cervical carcinoma, and HepG2 are

human hepatocellular carcinoma.

62

Figure S1

63

Figure S2

Chromosome 8

position (kb)118100 118150 118200 118250 118300 118350

D' i

n FU

SIO

Nr2 in

FU

SIO

N

0

100

cM/M

b

012345

−−lo

g 10((p

adj))

●●●

●●●● ● ●

●●●

●●●● ● ●● ● ●●

●●

●●●●●

●

●● ●●●

●●●● ● ●● ●●

●●●

●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●● ●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●● ●

●●●●●●●●●●●●●●●●●●

●●

● ●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●

●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

012345

−−lo

g 10((p

))

●●

●

●

●●● ●●

●

●

●

●

●

●● ●

●

●● ●●●

●●●

●● ●

●●●

●●

●

●●●● ● ●●

●● ●●

●

●● ●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●

●●●

●●●●●

●●●● ●●

●

●●●●●●●●●●●●

●●●●●●●●●●

●

●●●

●●●●●●●●●●●●●●●●●●●●● ●

●●●●●

●●●

●●●●

●

●

●●●●

●

●

● ●●●●●●●●●●●●●●●

●●

●

●

●●

●

●

●●●●●●●

●

●

●●●

●

●●●●

●●●●●●●●

●●●

●●●●

●

●●

●

●●

●

●●

●●●

●●●●

●

●●●●●

●

●

●●●●●●●●●●●●●●●●●

●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

SLC30A8 −>

rs13266634

64

Figure S3

65

Table S1. Characteristics of stage 1 and stage 2 case and control samples Stage 1 Stage 2 Cases Controls Cases Controls Median IQR Median IQR Median IQR Median IQR

N 1161 1174 1215 1258

Male 653 574 724 768

Female 508 600 491 490

Age of Diagnosis (years) 53.0 12.0 --- --- 56.0 12.0 --- ---

Study Age (years) 63.4 11.2 64.0 11.7 60.0 11.5 59.0 10.6

BMI (kg/m2) 29.8 6.1 26.8 5.0 30.1 6.7 26.4 4.9 Fasting Plasma Glucose (mmol/l) 8.4 3.9 5.4 0.7 7.2a 2.1a 5.4b 0.6b

an=204 and bn=583 values converted from whole blood to plasma glucose equivalent using prediction equation from the European Diabetes Epidemiology Group (22), of which bn=262 fasted < 8 hours

66

Tabl

e S2

A.

Det

aile

d ch

arac

teris

tics o

f sta

ge 1

cas

e an

d co

ntro

l sam

ples

FU

SIO

NFi

nris

k20

02

Cas

es

C

ontro

lsC

ontro

ls fr

om F

inris

k 20

02

Cas

esC

ontro

ls

Med

ian

IQR

Med

ian

IQR

Med

ian

IQR

Med

ian

IQR

Med

ian

IQR

N

789

5

23a

27

6

372

37

5

M

ale

429

194

163

224

217

Fe

mal

e

36

032

911

314

815

8A

ge o

f Dia

gnos

is (y

ears

)

51.0

11.0

---

---

---

---

59.0

12

.0--

---

-St

udy

Age

(yea

rs)

64.2

10

.1

69.6

7.

7 62

.0

9.0

61.0

12

.0

61.0

12

.0

BM

I (kg

/m2 )

29.3

6.2

27.3

5.5

26.5

4.5

30.7

6.0

26.6

4.4

Fast

ing

Plas

ma

Glu

cose

(mm

ol/l)

9.

6 4.

7 5.

1 0.

6 5.

6 0.

5 7.

3 1.

3 5.

6 0.

5 a C

ompr

ised

of 2

19 F

USI

ON

con

trols

from

Van

taa

who

wer

e N

GT

at a

ges 6

5 an

d 70

yea

rs, a

nd 3

04 N

GT

spou

ses o

f FU

SIO

N T

2D su

bjec

ts

Tabl

e S2

B.

Det

aile

d ch

arac

teris

tics o

f sta

ge 2

cas

e an

d co

ntro

l sam

ples

D2D

H

ealth

200

0 A

ctio

n LA

DA

Fi

nris

k 19

87

Savi

taip

ale

Dia

bete

s Stu

dy

Cas

es

Con

trols

Cas

es

Con

trols

Cas

es

Con

trols

Cas

es

Con

trols

Cas

es

Con

trols

M

edia

nIQ

RM

edia

nIQ

RM

edia

nIQ

RM

edia

nIQ

RM

edia

n IQ

R

Med

ian

IQR

M

edia

n IQ

R

Med

ian

IQR

Med

ian

IQR

Med

ian

IQR

N

327

31

4

127

12

4

373

40

2a

266

30

0

122

11

8

M

ale

184

17

6

67

66

235

25

9

171

20

2

67

65

Fe

mal

e 14

3

138

60

58

13

8

143

95

98

55

53

A

ge o

f D

iagn

osis

(y

ears

) 60

.0

13

.0--

---

-55

.013

.0--

---

-55

.010

.0--

---

-55

.013

.0--

---

-55

.111

.7--

---

-

Stud

y A

ge

(yea

rs)

64.0

11.4

64.3

12.0

61.0

15.0

59.0

12.0

60.2

10.8

58.0

9.0

58.0

11.0

57.0

12.0

57.9

13.4

57.0

13.0

BM

I (kg

/m2 )

29.9

7.1

26.4

4.9

30.3

5.4

26.5

5.6

30.3

6.9

26.3

4.7

30.5

6.1

26.7

4.8

28.3

7.1

25.4

4.5

Fast

ing

Plas

ma

Glu

cose

(m

mol

/l)

7.2

2.

05.

40.

57.

32.

05.

40.

57.

32.

45.

5b 0.

6b 6.

9c 3.

0c 5.

1cd

0.6cd

7.

2c 0.

9c 5.

6c 0.

4c

a 85 D

2D, 1

00 H

ealth

200

0, 5

2 Fi

nris

k 20

02, 9

7 Fi

nris

k 19

87, a

nd 6

8 Sa

vita

ipal

e D

iabe

tes S

tudy

con

trols

b n=

165

valu

es c

onve

rted

from

who

le b

lood

to p

lasm

a gl

ucos

e eq

uiva

lent

usi

ng p

redi

ctio

n eq

uatio

n fr

om th

e Eu

rope

an D

iabe

tes E

pide

mio

logy

Gro

up (2

2) ,

of w

hich

n=5

2 fa

sted

< 8

hou

rs

c all v

alue

s con

verte

d fr

om w

hole

blo

od to

pla

sma

gluc

ose

equi

vale

nt u

sing

pre

dict

ion

equa

tion

from

the

Euro

pean

Dia

bete

s Epi

dem

iolo

gy G

roup

(22)

d n=

210

fast

ed <

8 h

ours

67

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1. S

ets o

f SN

Ps, w

here

eac

h SN

P is

with

in 1

00kb

of t

he p

rece

ding

SN

P, a

re d

elim

ited

by li

nes.

SNP

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

G

enot

yped

in

Sta

ge 2

? rs

5279

12

CD

A1

20,6

79,5

89G

/A.6

70

.723

1.

304

1.14

1-1.

49

9.4

x 10

-5

rs38

2032

1 PI

NK

11

20,7

08,1

33G

/A.6

02.6

631.

291

1.14

2-1.

459

4.0

x 10

-5

rs60

7254

D

DO

ST, K

IF17

, PIN

K1

120

,726

,186

G/A

.601

.663

1.29

4 1.

145-

1.46

3 3.

4 x

10-5

rs58

9709

D

DO

ST, K

IF17

, PIN

K1

120

,729

,293

G/A

.601

.663

1.29

7 1.

147-

1.46

5 2.

9 x

10-5

rs64

0742

DD

OST

, KIF

17, P

INK

1 1

20,7

29,8

60A

/C.6

01.6

631.

297

1.14

7-1.

465

2.9

x 10

-5Y

esrs

6238

17

DD

OST

, KIF

17, P

INK

1 1

20,7

31,3

84G

/A.6

01.6

631.

297

1.14

7-1.

467

3.1

x 10

-5

rs67

4114

D

DO

ST, K

IF17

1

20,7

34,9

78G

/A.6

15.6

681.

321

1.15

1-1.

516

6.8

x 10

-5

rs63

0484

D

DO

ST, K

IF17

1

20,7

37,9

12G

/T.6

16.6

701.

332

1.15

9-1.

530

4.8

x 10

-5

rs12

1187

60

DD

OST

, KIF

17

120

,745

,110

T/C

.736

.767

1.70

8 1.

331-

2.19

1 2.

2 x

10-5

rs19

3239

71

`29,

732,

290

T/C

.168

.215

1.35

11.

164-

1.56

97.

1 x

10-5

rs66

0392

6 1

29,7

35,2

48A

/G.1

68.2

151.

352

1.16

4-1.

57

7.0

x 10

-5

rs96

6252

4 1

29,7

39,4

96G

/C.1

68.2

151.

351

1.16

4-1.

569

7.3

x 10

-5

rs91

5409

1

29,7

40,3

63T/

C.1

68.2

151.

351

1.16

4-1.

569

7.3

x 10

-5

rs92

8693

81

29,7

46,1

94T

/C.1

68.2

141.

345

1.15

9-1.

562

9.1

x 10

-5

rs96

5952

3 1

29,7

46,6

93A

/C.1

69.2

151.

344

1.15

7-1.

56

1.0

x 10

-4

rs27

1306

1

29,7

51,7

57G

/C.1

68.2

141.

344

1.15

7-1.

561

1.0

x 10

-4

rs17

3564

14

159

,031

,529

C/T

.548

.607

1.31

1 1.

158-

1.48

5 1.

7 x

10-5

8.

0 x

10-4

Yes

rs66

7605

9 1

59,0

41,7

77G

/A.5

48.6

061.

312

1.15

9-1.

485

1.7

x 10

-5

rs

1213

3457

1

59,0

42,7

84G

/A.5

48.6

061.

312

1.15

9-1.

485

1.7

x 10

-5

rs17

0259

78K

CN

A10

111

0,78

1,65

3G

/A.9

14.9

471.

705

1.34

7-2.

158

6.6

x 10

-6Y

esrs

1702

5982

K

CN

A10

1

110,

782,

336

T/C

.910

.943

1.69

9 1.

342-

2.15

1 7.

8 x

10-6

rs

2790

372

111

0,79

9,16

6C

/A.9

37.9

621.

750

1.32

0-2.

319

7.5

x 10

-5

rs27

9976

5 1

110,

800,

193

T/C

.937

.962

1.74

8 1.

317-

2.31

9 8.

5 x

10-5

rs16

2607

8 1

110,

801,

281

C/T

.937

.962

1.74

8 1.

316-

2.32

2 8.

9 x

10-5

rs16

2267

5 1

110,

801,

684

A/T

.937

.962

1.75

8 1.

321-

2.33

8 8.

3 x

10-5

rs16

2757

2 1

110,

801,

712

G/A

.938

.962

1.75

6 1.

319-

2.33

8 8.

9 x

10-5

rs25

0135

4SL

AM

F8,

VSI

G8

115

6,62

8,71

5G

/A.3

55.4

151.

274

1.12

9-.4

378.

1 x

10-5

rs

2501

350

SLAM

F8, V

SIG

8 1

15

6,63

0,07

7G

/C.3

79.4

371.

288

1.13

6-.4

59

7.0

x 10

-5

rs35

7973

2

3,29

2,09

4G

/A.9

42.9

611.

975

1.39

4-2.

798

9.3

x 10

-5

rs35

7971

2

3,

292,

963

G/C

.942

.961

1.97

7 1.

395-

2.80

2 9.

1 x

10-5

rs23

3854

5PL

B1

228

,711

,426

G/A

.202

.252

1.33

21.

157-

.534

6.3

x 10

-5

rs22

4943

4 SC

LY

2

238,

757,

753

C/G

.076

.110

1.49

7 1.

221-

.835

9.

1 x

10-5

rs13

9113

6 3

21

,136

,392

C/T

.838

.874

1.42

5 1.

195-

1.70

0 7.

5 x

10-5

rs11

9268

89

3

30,2

53,2

94G

/A.8

80.9

111.

537

1.24

3-1.

900

6.1

x 10

-5

rs14

3400

6

330

,268

,508

C/T

.904

.934

1.

586

1.26

8-1.

984

4.4

x 10

-5

rs13

0752

343

30,2

69,4

34C

/T.9

22.9

46

1.70

7 1.

311-

2.22

3 5.

8 x

10-5

rs10

4401

373

30,2

70,9

78G

/T.9

04.9

34

1.58

1 1.

266-

1.97

4 4.

4 x

10-5

rs98

7041

03

30,2

83,7

63C

/T.9

04.9

351.

579

1.26

7-1.

967

3.8

x 10

-5

rs13

0926

023

30,2

84,9

49G

/A.9

06.9

39

1.66

0 1.

324-

2.08

1 8.

2 x

10-6

rs

1495

586

330

,302

,792

G/A

.907

.940

1.

666

1.32

7-2.

091

8.2

x 10

-6

rs17

0813

523

30,3

07,8

51C

/A.9

10.9

42

1.69

8 1.

342-

2.14

8 7.

6 x

10-6

5.

5 x

10-6

Yes

rs98

4315

33

30,3

08,2

52G

/T.9

13.9

44

1.72

2 1.

351-

2.19

5 8.

4 x

10-6

rs11

7143

433

34,4

37,8

73T

/C.0

84.1

181.

472

1.21

0-1.

791

9.6

x 10

-5

68

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1 (c

ontin

ued)

SN

P

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

G

enot

yped

in

Sta

ge 2

? rs

7399

84PT

PRG

3

61

,975

,357

G/A

.729

.777

1.32

01.

150-

1.51

57.

2 x

10-5

rs12

4901

28

TMEM

108

3

134,

391,

491

A/C

.118

.162

1.46

5 1.

234-

1.73

9 1.

1 x

10-5

rs13

0721

06TM

EM

108

3

134,

425,

451

T/C

.118

.155

1.41

41.

188-

1.68

28.

7 x

10-5

Yes

rs10

5128

91

TMEM

108

313

4,43

1,55

7A

/T.1

18.1

561.

415

1.18

9-1.

684

8.3

x 10

-5

rs76

5074

1 TM

EM10

83

134,

432,

277

T/C

.118

.156

1.41

6 1.

189-

1.68

4 8.

2 x

10-5

rs76

1259

5 TM

EM10

83

134,

439,

991

T/C

.118

.156

1.41

8 1.

192-

1.68

8 7.

5 x

10-5

rs16

8401

61

TMEM

108

313

4,47

8,42

4A

/G.1

17.1

581.

444

1.21

3-1.

718

3.1

x 10

-5

rs17

2973

32

TMEM

108

313

4,48

0,78

2G

/C.1

21.1

621.

447

1.21

6-1.

723

2.9

x 10

-5

rs76

2511

0 TM

EM10

83

134,

494,

477

T/G

.117

.158

1.44

6 1.

215-

1.72

2 2.

9 x

10-5

rs10

5128

96

TMEM

108

313

4,49

9,45

7G

/C.1

17.1

581.

450

1.21

8-1.

726

2.7

x 10

-5

rs17

0837

3 TM

EM10

83

134,

502,

025

G/A

.117

.158

1.45

1 1.

219-

1.72

8 2.

5 x

10-5

rs11

9731

6 TM

EM10

83

134,

522,

283

G/A

.117

.158

1.45

5 1.

222-

1.73

4 2.

3 x

10-5

rs19

2002

1 TM

EM10

83

134,

554,

123

T/C

.118

.158

1.45

0 1.

216-

1.72

9 3.

1 x

10-5

rs82

3968

313

6,54

2,75

5C

/T.3

82.4

361.

274

1.13

1-1.

436

6.7

x 10

-5

rs46

8729

6 M

AP3K

133

18

6,59

5,00

2T/

C.2

25.2

761.

325

1.15

8-1.

516

3.9

x 10

-5

rs46

8729

9M

AP3

K13

3

186,

595,

361

A/G

.225

.276

1.32

51.

158-

1.51

54.

0 x

10-5

Yes

rs88

6374

SOR

CS2

47,

856,

440

T/C

.211

.270

1.38

51.

209-

1.58

72.

4 x

10-6

Yes

rs68

1529

2 AT

P8A1

4

42

,251

,192

A/G

.244

.291

1.30

8 1.

144-

1.49

6 7.

9 x

10-5

rs76

6582

4 AT

P8A1

4

42,2

52,4

81T/

G.2

44.2

911.

309

1.14

5-1.

496

7.8

x 10

-5

rs11

7265

81

ATP8

A1

442

,257

,935

C/T

.244

.291

1.30

9 1.

145-

1.49

7 7.

7 x

10-5

rs11

7225

56

ATP8

A1

442

,258

,828

T/C

.244

.291

1.30

9 1.

145-

1.49

7 7.

5 x

10-5

rs17

6303

57

ATP8

A1

442

,266

,042

A/T

.774

.821

1.34

6 1.

160-

1.56

2 8.

2 x

10-5

rs43

1723

8 AT

P8A1

4

42,2

67,1

05A

/G.7

74.8

211.

346

1.16

0-1.

562

8.1

x 10

-5

rs16

8543

59

ATP8

A1

442

,269

,100

C/G

.241

.290

1.31

3 1.

149-

1.50

1 5.

7 x

10-5

rs99

9437

2 AT

P8A1

4

42,2

69,1

38T/

C.2

51.3

011.

335

1.16

6-1.

527

2.5

x 10

-5

rs10

0344

39

ATP8

A1

442

,287

,090

C/T

.776

.826

1.37

4 1.

182-

1.59

8 3.

1 x

10-5

rs13

1392

19A

TP8A

14

42,2

94,2

31C

/A.7

79.8

271.

346

1.16

0-1.

561

7.8

x 10

-5Y

esrs

6812

080

ATP8

A1

442

,319

,554

G/A

.779

.828

1.34

9 1.

163-

1.56

5 7.

0 x

10-5

rs13

1160

32

ATP8

A1

442

,320

,518

G/T

.779

.828

1.34

9 1.

163-

1.56

5 7.

0 x

10-5

rs50

2252

1 EL

OVL

6 4

111,

486,

191

T/C

.858

.884

1.78

5 1.

349-

2.36

1 4.

1 x

10-5

rs10

3023

15

66,3

53,0

21G

/A.1

98.2

451.

330

1.15

2-1.

536

9.3

x 10

-5

rs10

4768

44

5

142,

096,

902

T/C

.014

.023

4.66

6 2.

212-

9.84

1 3.

5 x

10-5

rs96

1730

AR

HG

AP26

5

142,

114,

126

C/T

.014

.024

4.69

6 2.

254-

9.78

4 2.

4 x

10-5

rs13

4713

3 AR

HG

AP26

514

2,11

4,29

0C

/T.0

14.0

244.

745

2.27

5-9.

899

2.1

x 10

-5

rs96

8076

AR

HG

AP26

514

2,11

6,49

1G

/A.0

14.0

244.

787

2.29

3-9.

993

2.0

x 10

-5

rs77

1490

7 AR

HG

AP26

514

2,12

5,57

0G

/A.0

14.0

235.

319

2.47

3-11

.441

1.

2 x

10-5

rs77

3220

7 AR

HG

AP26

514

2,12

5,61

3A

/G.0

14.0

235.

317

2.47

2-11

.439

1.

2 x

10-5

rs76

4387

AR

HG

AP26

514

2,12

5,86

9T/

C.0

14.0

235.

326

2.47

2-11

.474

1.

2 x

10-5

rs77

3701

8 AR

HG

AP26

514

2,12

6,28

3C

/G.0

14.0

235.

317

2.46

2-11

.483

1.

3 x

10-5

rs68

9867

5 AR

HG

AP26

514

2,13

1,84

3T/

C.0

14.0

235.

320

2.45

6-11

.526

1.

4 x

10-5

rs68

9443

3 AR

HG

AP26

514

2,13

3,53

5C

/T.0

14.0

235.

315

2.45

2-11

.523

1.

4 x

10-5

rs70

7177

AR

HG

AP26

514

2,23

2,07

6A

/G.3

72.4

241.

308

1.14

6-1.

493

6.4

x 10

-5

rs44

7923

AR

HG

AP26

514

2,23

2,44

1T/

C.3

25.3

731.

321

1.14

8-1.

519

9.2

x 10

-5

rs26

707

ARH

GAP

265

142,

233,

857

G/C

.250

.303

1.32

5 1.

160-

1.51

3 3.

0 x

10-5

69

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1 (c

ontin

ued)

SN

P

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

G

enot

yped

in

Sta

ge 2

? rs

2670

6 AR

HG

AP26

5 14

2,23

7,04

4

C/G

.253

.3

06

1.32

4 1.

159-

1.51

3 3.

2 x

10-5

rs

2777

9A

RH

GA

P26

5

142,

239,

267

A/C

.250

.304

1.32

61.

162-

1.51

32.

5 x

10-5

Yes

rs27

546

ARH

GAP

265

142,

245,

929

T/A

.250

.302

1.32

1 1.

157-

1.50

8 3.

5 x

10-5

rs11

9703

89

TUBB

2B, L

OC

3893

62

63,

195,

655

T/C

.041

.063

1.84

5 1.

351-

2.51

8 9.

2 x

10-5

rs47

1399

2 6

36

,720

,183

A/G

.730

.764

1.52

5 1.

240-

1.87

5 5.

7 x

10-5

rs77

5044

5 ZF

AND

3 6

37,8

72,9

55G

/C.1

14.1

581.

483

1.24

4-1.

769

9.4

x 10

-6

4.1

x 10

-5Y

esrs

1723

5125

6

79

,437

,555

A/G

.871

.906

1.45

9 1.

207-

1.76

2 8.

0 x

10-5

rs17

2351

67

6

79,4

37,6

14C

/G.8

71.9

061.

459

1.20

8-1.

763

7.8

x 10

-5

rs17

2352

09

679

,437

,636

C/T

.871

.906

1.46

1 1.

209-

1.76

5 7.

6 x

10-5

rs17

8268

01

679

,437

,741

A/G

.871

.906

1.46

0 1.

208-

1.76

4 7.

8 x

10-5

rs20

2196

6 EN

PP1

613

2,19

2,13

2A

/G.5

85.6

341.

320

1.15

0-1.

516

7.2

x 10

-5

2.6

x 10

-4Y

esrs

2813

539

SYN

E1

6

152,

613,

828

G/A

.382

.435

1.31

2 1.

150-

1.49

6 4.

8 x

10-5

rs14

0846

0 SY

NE1

6

152,

614,

232

C/G

.460

.518

1.26

7 1.

126-

1.42

6 8.

3 x

10-5

rs71

9764

SY

NE1

6

152,

614,

487

C/G

.483

.538

1.29

3 1.

141-

1.46

6 5.

4 x

10-5

rs26

7377

6SY

NE

1 6

152,

614,

926

G/T

.458

.516

1.26

51.

125-

1.42

28.

0 x

10-5

rs26

3544

1 SY

NE1

6

152,

615,

257

A/G

.460

.517

1.26

4 1.

123-

1.42

2 9.

4 x

10-5

rs13

2120

52

616

6,26

4,60

1T/

C.9

79.9

922.

979

1.66

8-5.

323

8.2

x 10

-5

rs27

9130

0 7

18

,102

,317

C/G

.704

.752

1.31

9 1.

149-

1.51

4 7.

7 x

10-5

rs47

2170

8 7

18,1

43,5

42C

/T.7

02.7

601.

373

1.19

9-1.

572

3.8

x 10

-6

rs61

5545

718

,165

,111

C/T

.694

.751

1.36

11.

190-

1.55

65.

9 x

10-6

Yes

rs24

7098

4SL

C13

A1

712

2,36

8,68

0A

/C.2

97.3

481.

279

1.13

0-1.

448

9.0

x 10

-5Y

esrs

6466

855

SLC

13A1

7

122,

371,

141

A/G

.294

.346

1.28

9 1.

137-

1.46

2 7.

0 x

10-5

rs69

6427

2 SL

C13

A17

122,

373,

978

T/C

.265

.317

1.33

3 1.

168-

1.52

1.

7 x

10-5

rs13

4441

83

SLC

13A1

712

2,37

7,23

2G

/T.2

65.3

171.

333

1.16

8-1.

521

1.8

x 10

-5

rs69

6373

5 SL

C13

A17

122,

394,

634

C/T

.256

.306

1.35

0 1.

176-

1.54

9 1.

8 x

10-5

rs10

2804

30

SLC

13A1

712

2,39

9,30

6C

/T.2

55.3

051.

350

1.17

6-1.

549

1.9

x 10

-5

rs18

8017

8 SL

C13

A17

122,

403,

062

T/C

.255

.305

1.35

0 1.

176-

1.55

1.

9 x

10-5

rs10

9546

547

138,

816,

342

C/T

.725

.776

1.33

71.

166-

1.53

32.

8 x

10-5

Yes

rs10

2776

03

7

138,

816,

687

C/T

.592

.645

1.35

4 1.

179-

1.55

4 1.

5 x

10-5

rs10

2619

79

713

8,81

6,83

2G

/C.6

01.6

531.

367

1.18

7-1.

574

1.3

x 10

-5

rs10

2623

38

713

8,81

6,91

3A

/G.5

92.6

451.

355

1.18

0-1.

555

1.5

x 10

-5

rs96

9240

1 7

138,

817,

247

C/T

.584

.637

1.36

4 1.

187-

1.56

7 1.

1 x

10-5

rs96

9166

2 7

138,

817,

453

A/G

.592

.645

1.35

3 1.

179-

1.55

4 1.

6 x

10-5

rs96

9041

8 7

138,

817,

495

G/A

.592

.645

1.35

3 1.

179-

1.55

3 1.

6 x

10-5

rs12

7074

49

713

8,81

7,98

3A

/T.5

92.6

451.

353

1.17

9-1.

553

1.6

x 10

-5

rs10

2712

87

713

8,81

9,51

7T/

C.5

92.6

451.

353

1.17

9-1.

554

1.6

x 10

-5

rs38

732

MRP

S33

714

0,15

8,34

6T/

A.0

69.0

961.

680

1.29

6-2.

178

6.9

x 10

-5

rs92

74

MRP

S33

7

140,

159,

215

A/G

.048

.076

1.63

9 1.

279-

2.10

1 7.

5 x

10-5

rs54

4081

7

140,

209,

733

G/A

.048

.076

1.64

3 1.

282-

2.10

6 6.

7 x

10-5

rs48

8795

7

140,

211,

070

T/G

.048

.076

1.64

3 1.

282-

2.10

5 6.

8 x

10-5

rs51

2509

7

140,

211,

331

T/C

.048

.076

1.64

3 1.

282-

2.10

5 6.

7 x

10-5

rs54

8245

7

140,

212,

951

T/C

.047

.075

1.63

5 1.

274-

2.09

9 8.

9 x

10-5

rs47

1817

7

140,

214,

431

A/C

.048

.076

1.64

3 1.

282-

2.10

5 6.

8 x

10-5

rs80

1155

714

0,22

1,13

4A

/G.0

48.0

76

1.64

2 1.

282-

2.10

5 6.

8 x

10-5

70

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1 (c

ontin

ued)

SN

P

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

G

enot

yped

in

Sta

ge 2

? rs

5289

57LO

C64

2421

714

0,22

2,64

3

T/C

.048

.076

1.63

41.

276-

2.09

47.

8 x

10-5

rs55

7962

7

14

0,23

2,92

4T

/C.0

47.0

761.

650

1.28

7-2.

115

5.9

x 10

-5Y

esrs

7842

241

C8o

rf68

81,

056,

317

G/A

.634

.688

1.28

5 1.

134-

1.45

6 8.

1 x

10-5

rs97

9728

D

LC1

8

13,4

35,3

09T/

C.3

71.4

051.

464

1.20

9-1.

772

8.6

x 10

-5

rs18

5202

7C

NB

D1

8

88,0

76,2

30G

/A.5

52.6

111.

269

1.12

7-1.

428

7.6

x 10

-5

rs17

7077

46

PTD

SS1

8

97,3

84,8

21C

/A.0

41.0

651.

750

1.31

7-2.

326

8.7

x 10

-5

rs88

3655

PT

DSS

18

97,3

86,3

57C

/T.0

41.0

651.

751

1.31

7-2.

328

8.9

x 10

-5

rs13

4392

40

PTD

SS1

897

,387

,836

T/C

.041

.065

1.75

2 1.

317-

2.33

0 8.

9 x

10-5

rs78

3029

3 G

PR20

814

2,44

2,69

1C

/T.0

66.0

991.

597

1.27

6-1.

999

3.6

x 10

-5

rs65

7816

7 G

PR20

8

142,

450,

474

C/A

.065

.098

1.57

8 1.

264-

1.97

0 4.

7 x

10-5

rs78

3924

4G

PR20

814

2,45

7,43

7A

/G.0

66.0

981.

553

1.24

8-1.

932

6.8

x 10

-5Y

esrs

4961

268

GPR

208

142,

464,

393

G/A

.064

.097

1.58

6 1.

271-

1.98

0 3.

7 x

10-5

rs49

6175

5 BN

C2

916

,759

,812

C/G

.121

.158

1.46

7 1.

213-

1.77

4 7.

0 x

10-5

rs12

6831

58

NFI

L39

91

,266

,820

C/T

.927

.954

1.73

6 1.

333-

2.26

1 3.

2 x

10-5

rs13

2972

68

NFI

L39

91,2

67,6

96G

/A.9

27.9

541.

745

1.33

8-2.

277

3.0

x 10

-5

9.0

x 10

-5

Yes

rs13

2897

38

NFI

L39

91,2

71,7

01G

/T.9

26.9

511.

793

1.35

4-2.

372

3.3

x 10

-5

rs78

5634

8 C

YLC

2 9

102,

835,

550

C/A

.541

.591

1.30

8 1.

144-

1.49

5 7.

9 x

10-5

rs13

3014

6 9

10

7,63

1,79

4G

/A.5

45.6

031.

289

1.14

2-1.

455

3.7

x 10

-5

rs10

8165

76

9

107,

633,

222

G/A

.545

.603

1.28

9 1.

142-

1.45

5 3.

7 x

10-5

rs10

1211

93

910

7,66

0,60

1A

/G.3

82.4

261.

348

1.16

1-1.

565

8.4

x 10

-5

rs45

4387

7 10

65,1

72,0

27C

/G.4

39.4

971.

330

1.17

3-1.

507

7.7

x 10

-6

rs38

6479

9 10

65,1

72,3

88G

/C.4

39.4

971.

330

1.17

3-1.

508

7.5

x 10

-6

rs39

1216

5 10

65,1

87,6

97A

/G.4

27.4

851.

349

1.18

6-1.

534

4.5

x 10

-6

rs10

7401

40

1065

,189

,760

A/G

.428

.485

1.29

0 1.

145-

1.45

2 2.

5 x

10-5

rs47

4639

6 10

65,1

94,1

29C

/G.4

36.4

941.

274

1.13

6-1.

429

3.1

x 10

-5

rs16

9188

64

1065

,228

,767

G/C

.430

.487

1.27

5 1.

136-

1.43

1 3.

4 x

10-5

rs31

0405

6 10

71,1

80,0

45G

/A.9

74.9

863.

162

1.73

6-5.

758

6.3

x 10

-5

rs17

7473

24

TCF7

L2

10

11

4,74

2,49

3C

/T.1

41.1

811.

445

1.21

4-1.

719

3.0

x 10

-5

rs79

0314

6TC

F7L

210

114,

748,

339

T/C

.179

.229

1.38

81.

197-

1.61

01.

2 x

10-5

Yes

rs12

2433

26

TCF7

L2

1011

4,77

8,80

5C

/T.1

63.2

131.

429

1.22

4-1.

667

5.0

x 10

-6

rs12

2553

72TC

F7L

210

114,

798,

892

T/G

.156

.203

1.40

01.

201-

1.63

21.

5 x

10-5

Yes

rs12

2882

14

1141

,772

,225

G/A

.915

.946

1.68

1 1.

316-

2.14

7 2.

5 x

10-5

rs12

2848

61

1141

,787

,876

A/G

.915

.946

1.

685

1.32

0-2.

150

2.1

x 10

-5

rs11

0365

7711

41,7

92,4

60C

/T.9

14.9

46

1.68

4 1.

320-

2.14

8 2.

1 x

10-5

rs12

7974

3611

41,7

98,9

17A

/C.9

13.9

44

1.62

4 1.

279-

2.06

2 5.

4 x

10-5

rs12

2747

3211

41,8

05,5

01C

/T.9

14.9

46

1.68

2 1.

319-

2.14

5 2.

1 x

10-5

rs12

2759

2311

41,8

18,5

26A

/C.9

14.9

46

1.68

5 1.

321-

2.15

0 2.

0 x

10-5

rs12

2945

5211

41,8

21,0

81G

/C.9

13.9

44

1.62

9 1.

282-

2.06

9 5.

2 x

10-5

rs11

0366

0011

41,8

23,6

51A

/G.9

14.9

46

1.68

5 1.

321-

2.15

0 2.

0 x

10-5

rs11

6004

9511

41,8

28,6

09C

/A.9

14.9

44

1.62

2 1.

273-

2.06

5 7.

3 x

10-5

rs10

1604

4211

41,8

33,6

78T/

C.9

14.9

46

1.68

3 1.

318-

2.14

8 2.

2 x

10-5

rs37

6382

711

41,8

34,4

54G

/C.9

13.9

43

1.62

5 1.

278-

2.06

6 5.

9 x

10-5

rs64

8528

811

41,8

37,9

14A

/G.9

06.9

39

1.61

6 1.

285-

2.03

2 3.

2 x

10-5

rs12

2802

9411

41,8

38,3

23G

/T.9

14.9

45

1.68

3 1.

318-

2.15

0 2.

3 x

10-5

71

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1 (c

ontin

ued)

SN

P

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

G

enot

yped

in

Sta

ge 2

? rs

1228

1155

11

41,8

43,6

40C

/G.9

14

.945

1.

684

1.31

8-2.

151

2.3

x 10

-5

rs12

7866

34

1141

,845

,196

C/T

.914

.945

1.

683

1.31

8-2.

150

2.3

x 10

-5

rs12

2775

5711

41,8

49,1

52A

/T.9

12.9

43

1.68

6 1.

320-

2.15

5 2.

2 x

10-5

rs12

7937

9511

41,8

54,7

02G

/A.9

06.9

36

1.58

8 1.

258-

2.00

5 8.

4 x

10-5

rs12

2715

2511

41,8

58,4

37G

/A.8

91.9

25

1.51

2 1.

228-

1.86

0 8.

1 x

10-5

rs79

2820

011

41,8

59,1

09A

/G.8

91.9

25

1.51

2 1.

229-

1.86

1 8.

0 x

10-5

rs12

2733

4411

41,8

59,3

53G

/T.8

90.9

251.

516

1.23

3-1.

863

6.5

x 10

-5

rs12

7885

4811

41,8

62,9

57C

/T.8

91.9

25

1.51

3 1.

229-

1.86

2 7.

9 x

10-5

rs12

2887

3811

41,8

68,8

75T/

C.8

90.9

24

1.51

1 1.

229-

1.85

8 7.

5 x

10-5

rs15

8843

911

41,8

71,1

82G

/A.8

90.9

24

1.51

1 1.

229-

1.85

8 7.

5 x

10-5

rs16

9360

6711

41,8

71,8

20G

/T.9

06.9

36

1.58

0 1.

252-

1.99

3 9.

5 x

10-5

rs93

0003

911

41,8

71,9

42C

/A.8

90.9

251.

520

1.23

6-1.

869

6.0

x 10

-5Y

esrs

1103

6622

1141

,872

,742

C/T

.890

.924

1.

516

1.23

2-1.

864

6.9

x 10

-5

rs11

0366

2411

41,8

78,2

46T/

C.8

91.9

25

1.52

5 1.

236-

1.88

1 6.

8 x

10-5

rs12

7970

3811

41,8

80,4

53C

/T.9

07.9

37

1.59

8 1.

260-

2.02

6 9.

0 x

10-5

rs12

8042

1011

41,8

80,9

99T/

C.8

91.9

25

1.54

9 1.

251-

1.91

9 5.

1 x

10-5

rs11

0366

2711

41,8

81,2

90C

/A.9

04.9

37

1.66

2 1.

314-

2.10

3 1.

8 x

10-5

1.

9 x

10-5

Yes

rs11

0366

2811

41,8

81,3

52G

/A.9

04.9

37

1.66

2 1.

313-

2.10

3 1.

8 x

10-5

rs71

1424

111

41,8

82,1

03T/

C.8

91.9

25

1.55

2 1.

251-

1.92

4 5.

2 x

10-5

rs71

2874

311

41,8

82,2

75C

/A.8

91.9

25

1.55

2 1.

252-

1.92

5 5.

2 x

10-5

rs12

2883

6111

41,8

83,3

03C

/T.8

91.9

25

1.55

3 1.

252-

1.92

7 5.

1 x

10-5

rs12

8026

3411

41,8

86,1

38T/

C.8

91.9

25

1.55

4 1.

252-

1.92

8 5.

2 x

10-5

rs12

8028

6211

41,8

86,2

67T/

C.8

91.9

25

1.55

4 1.

252-

1.92

8 5.

2 x

10-5

rs11

6081

8911

41,8

87,3

87G

/T.9

07.9

37

1.60

9 1.

267-

2.04

5 7.

9 x

10-5

rs11

6020

0411

41,9

00,8

43G

/T.9

07.9

38

1.61

6 1.

271-

2.05

3 7.

0 x

10-5

rs11

6021

2711

41,9

01,5

57G

/A.9

07.9

38

1.62

8 1.

280-

2.07

0 5.

6 x

10-5

rs10

5012

8111

41,9

22,9

35C

/T.9

15.9

471.

617

1.27

6-2.

048

5.3

x 10

-5

rs11

8239

9211

41,9

26,8

56A

/T.9

18.9

49

1.65

1 1.

294-

2.10

5 4.

0 x

10-5

rs71

0180

911

41,9

33,7

15T/

C.9

18.9

49

1.65

3 1.

295-

2.10

9 4.

1 x

10-5

rs12

2870

5211

41,9

35,1

44A

/G.9

18.9

49

1.65

1 1.

289-

2.11

4 5.

6 x

10-5

rs11

0366

4211

41,9

40,9

97T/

A.9

21.9

51

1.69

9 1.

318-

2.19

1 3.

3 x

10-5

rs17

5534

0811

41,9

51,9

28T/

G.9

18.9

49

1.65

0 1.

288-

2.11

5 5.

8 x

10-5

rs12

2934

0811

41,9

56,3

32C

/T.9

21.9

51

1.69

5 1.

315-

2.18

6 3.

5 x

10-5

rs16

9362

0011

41,9

63,3

15A

/C.9

06.9

39

1.63

5 1.

294-

2.06

7 3.

0 x

10-5

rs11

0366

4911

41,9

65,5

24A

/G.9

06.9

39

1.63

4 1.

293-

2.06

6 3.

1 x

10-5

rs12

5764

0811

41,9

71,2

03G

/T.9

06.9

39

1.63

3 1.

292-

2.06

4 3.

2 x

10-5

rs11

0366

5211

41,9

71,2

69T/

C.9

07.9

39

1.62

9 1.

288-

2.05

8 3.

5 x

10-5

rs71

0724

611

41,9

72,4

28C

/A.8

83.9

15

1.63

0 1.

287-

2.06

4 4.

0 x

10-5

rs11

6049

6611

41,9

72,7

36T/

C.9

07.9

40

1.62

3 1.

285-

2.05

1 3.

8 x

10-5

rs10

8377

6611

41,9

84,3

77T/

C.8

40.8

82

1.47

2 1.

232-

1.75

9 1.

8 x

10-5

8.

6 x

10-5

Yes

rs17

5540

0511

41,9

89,1

48A

/C.9

16.9

47

1.68

6 1.

312-

2.16

6 3.

4 x

10-5

rs17

5540

5411

41,9

90,2

18T/

C.9

16.9

47

1.68

2 1.

310-

2.16

1 3.

6 x

10-5

rs17

5540

8111

41,9

90,2

80A

/G.9

16.9

46

1.67

7 1.

306-

2.15

4 3.

9 x

10-5

rs28

6245

611

41,9

90,7

69C

/T.9

16.9

46

1.66

8 1.

300-

2.14

0 4.

5 x

10-5

rs17

4629

5211

41,9

91,7

95A

/G.9

16.9

46

1.66

6 1.

299-

2.13

7 4.

6 x

10-5

72

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1 (c

ontin

ued)

SN

P

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

G

enot

yped

in

Sta

ge 2

? rs

1746

2994

11

41

,991

,889

T/C

.9

16

.946

1.

666

1.29

9-2.

137

4.6

x 10

-5

rs12

7929

32

1112

7,22

6,77

2G

/A.9

67.9

84

2.30

3 1.

515-

3.50

0 5.

2 x

10-5

rs12

8068

5911

127,

234,

379

T/G

.967

.984

2.

299

1.51

4-3.

492

5.2

x 10

-5

rs12

7990

3211

127,

328,

409

G/A

.963

.980

2.

197

1.46

9-3.

287

8.3

x 10

-5

rs12

7927

4911

127,

336,

192

G/A

.963

.980

2.

191

1.46

5-3.

275

8.6

x 10

-5

rs12

7976

3111

127,

341,

608

T/G

.963

.980

2.

191

1.46

5-3.

278

8.7

x 10

-5

rs12

7969

0011

127,

341,

924

C/A

.963

.980

2.

191

1.46

5-3.

276

8.8

x 10

-5

rs12

7939

0111

127,

345,

185

G/A

.963

.980

2.

198

1.46

8-3.

290

8.6

x 10

-5

rs11

6161

88

LTBR

, SC

NN

1A

126,

373,

003

A/G

.474

.522

1.40

0 1.

201-

1.63

3 1.

6 x

10-5

4.

8 x

10-5

Yes

rs73

1353

3 12

6,38

6,11

6A

/G.7

02.7

421.

394

1.17

9-1.

649

9.8

x 10

-5

rs

1258

1386

C

ORO

1C12

107,

585,

465

C/A

.962

.977

2.54

6 1.

571-

4.12

6 7.

6 x

10-5

rs38

2525

3C

OR

O1C

12

10

7,61

1,74

7A

/G.9

73.9

892.

575

1.60

4-4.

134

3.6

x 10

-5Y

esrs

7957

463

FLJ2

0674

, WSB

2 12

116,

981,

026

T/C

.577

.633

1.27

4 1.

134-

1.43

2 4.

2 x

10-5

rs79

5811

0 FL

J206

74, W

SB2

12

116,

981,

479

T/C

.577

.633

1.27

3 1.

133-

1.43

0 4.

4 x

10-5

rs47

6765

8F

LJ20

674,

WSB

2 12

116,

982,

161

T/C

.577

.633

1.27

41.

134-

1.43

04.

1 x

10-5

Yes

rs74

8830

9 FL

J206

74, W

SB2

1211

6,98

2,89

0G

/A.5

77.6

331.

273

1.13

3-1.

430

4.3

x 10

-5

rs27

1174

7 C

CD

C60

1211

8,36

0,95

3T/

G.0

14.0

253.

401

1.84

2-6.

280

4.9

x 10

-5

rs19

1841

6

12

118,

463,

133

C/T

.808

.853

1.38

31.

181-

1.61

84.

9 x

10-5

rs80

4628

1211

8,46

8,45

8G

/C.8

16.8

56

1.43

2 1.

204-

1.70

2 4.

4 x

10-5

rs26

6916

112

120,

663,

139

C/G

.846

.884

1.

457

1.21

0-1.

755

6.3

x 10

-5

rs27

0706

9

1212

0,66

6,80

4C

/T.8

46.8

84

1.46

2 1.

212-

1.76

4 6.

4 x

10-5

rs12

8752

713

80,7

31,2

74T/

C.0

85.1

20

1.49

3 1.

226-

1.81

9 6.

1 x

10-5

rs12

8752

6

1380

,734

,028

G/A

.088

.123

1.48

01.

219-

1.79

66.

4 x

10-5

rs98

2864

1380

,735

,627

C/T

.075

.109

1.

512

1.22

9-1.

859

7.7

x 10

-5

rs28

0159

713

80,7

36,0

45G

/A.0

75.1

09

1.51

2 1.

229-

1.85

9 7.

8 x

10-5

rs12

8753

313

80,7

40,6

50A

/T.0

83.1

17

1.49

0 1.

220-

1.82

0 8.

2 x

10-5

rs95

4585

113

81,2

34,8

88T/

C.5

25.5

83

1.27

9 1.

135-

1.44

1 5.

1 x

10-5

rs95

4585

2

1381

,237

,495

C/T

.525

.583

1.

278

1.13

4-1.

440

5.2

x 10

-5

rs95

3124

613

81,2

39,5

73C

/A.5

25.5

83

1.27

8 1.

134-

1.43

9 5.

3 x

10-5

rs95

4585

313

81,2

42,5

79T/

C.5

26.5

83

1.27

7 1.

134-

1.43

8 5.

4 x

10-5

rs11

1492

1413

81,2

83,6

09C

/A.5

26.5

83

1.27

6 1.

133-

1.43

8 5.

5 x

10-5

rs95

4587

013

81,2

86,2

74A

/G.5

26.5

83

1.27

6 1.

133-

1.43

8 5.

5 x

10-5

rs38

9159

113

81,2

91,9

69C

/T.5

17.5

73

1.27

6 1.

131-

1.44

0 6.

9 x

10-5

rs95

4590

313

81,3

44,9

14T

/C.4

59.5

141.

270

1.12

8-1.

430

7.2

x 10

-5

rs10

1351

9714

38,1

23,4

11T/

C.5

98.6

54

1.28

8 1.

138-

1.45

8 6.

1 x

10-5

rs80

1419

8

1438

,132

,529

G/A

.616

.670

1.

291

1.13

7-1.

464

7.0

x 10

-5

rs97

8849

014

38,1

32,6

89C

/G.6

03.6

59

1.28

7 1.

138-

1.45

5 5.

5 x

10-5

rs11

8491

7414

38,1

47,1

49G

/A.6

03.6

601.

287

1.13

8-1.

455

5.4

x 10

-5

rs10

1454

9314

38,1

51,1

39G

/A.6

03.6

59

1.28

7 1.

138-

1.45

5 5.

6 x

10-5

rs12

4354

3814

38,1

54,1

95T/

C.5

53.6

12

1.31

8 1.

161-

1.49

5 1.

7 x

10-5

rs13

4924

114

38,1

55,1

89T/

C.5

53.6

12

1.31

8 1.

161-

1.49

5 1.

8 x

10-5

rs10

1419

5714

38,1

57,0

20G

/A.5

49.6

10

1.32

3 1.

167-

1.50

0 1.

1 x

10-5

rs21

2233

114

38,1

63,3

58G

/C.5

14.5

75

1.27

5 1.

133-

1.43

5 5.

2 x

10-5

rs80

1048

914

38,1

63,6

18G

/A.5

23.5

84

1.28

1 1.

137-

1.44

4 4.

5 x

10-5

73

Tabl

e S3

. FU

SIO

N st

age

1 T2

D a

ssoc

iatio

n: g

enot

yped

(bol

d) a

nd im

pute

d (n

on-b

old)

SN

Ps w

ith p

-val

ue <

.000

1 (c

ontin

ued)

SN

P

Gen

es

C

hr

Po

sitio

n (b

p)

FUSI

ON

ris

k al

lele

/ no

n-ris

k al

lele

C

ontro

l ris

k fr

eque

ncy

C

ase

risk

freq

uenc

y

O

R

95

% C

I

p-

valu

e

Gen

otyp

ed

p-va

lue

for i

mpu

ted

SNP

Gen

otyp

ed

in S

tage

2?

rs14

4972

0

14

38,1

65,3

18A

/G.5

12.5

731.

269

1.12

8-1.

428

6.8

x 10

-5

rs12

1648

74

1438

,172

,603

C/T

.515

.577

1.

278

1.13

6-1.

439

4.5

x 10

-5

rs10

1383

4214

38,1

86,1

08A

/C.5

26.5

87

1.28

4 1.

139-

1.44

8 4.

0 x

10-5

rs71

5369

914

38,1

88,8

07C

/T.5

18.5

79

1.27

9 1.

136-

1.44

0 4.

4 x

10-5

rs65

7186

514

38,1

91,4

21T/

C.5

18.5

80

1.28

1 1.

137-

1.44

2 4.

1 x

10-5

rs71

4169

614

38,1

92,1

26T/

C.5

18.5

80

1.28

1 1.

138-

1.44

3 4.

0 x

10-5

rs80

0647

414

38,1

96,2

48G

/C.5

27.5

89

1.29

0 1.

144-

1.45

4 3.

1 x

10-5

rs21

2233

314

38,2

33,1

19C

/T.5

42.6

10

1.32

1 1.

171-

1.49

1 5.

3 x

10-6

rs14

4972

514

38,2

46,5

72C

/T.5

43.6

10

1.32

2 1.

172-

1.49

2 4.

9 x

10-6

1.

1 x

10-5

Yes

rs28

9988

314

38,2

55,6

04G

/T.5

39.6

04

1.32

0 1.

169-

1.49

1 7.

0 x

10-6

rs23

1939

2 G

PHN

1466

,136

,844

T/A

.014

.023

4.39

6 2.

050-

9.42

6 5.

0 x

10-5

rs38

2556

9LO

C38

8015

14

10

0,42

0,05

1C

/T.5

83.6

401.

292

1.14

3-1.

463.

7 x

10-5

rs12

9108

27

15

56,4

17,3

11T/

G.0

24.0

472.

592

1.73

8-3.

866

1.3

x 10

-6

6.3

x 10

-6Y

esrs

1163

4708

LO

C56

964,

PEX

11A,

PLI

N

15

88

,037

,214

C/T

.433

.485

1.31

5 1.

153-

1.50

0 4.

1 x

10-5

rs10

5210

9516

13

,528

,936

A/G

.206

.256

1.35

11.

174-

1.55

42.

3 x

10-5

Yes

rs64

9842

3 16

13,5

31,3

81A

/G.2

06.2

561.

351

1.17

4-1.

555

2.4

x 10

-5

rs12

1620

88

1613

,547

,393

G/A

.130

.169

1.40

7 1.

185-

1.67

1 8.

8 x

10-5

rs16

9622

70

1613

,547

,426

T/A

.130

.169

1.40

9 1.

186-

1.67

3 8.

7 x

10-5

rs20

3325

4 C

ETP

1655

,567

,486

T/C

.646

.693

1.36

7 1.

177-

1.58

7 4.

0 x

10-5

rs12

7089

80

CET

P16

55,5

69,8

80T/

G.6

33.6

771.

385

1.18

4-1.

621

4.4

x 10

-5

rs18

0077

4 C

ETP

1655

,573

,046

C/T

.640

.686

1.39

9 1.

195-

1.63

9 2.

8 x

10-5

7.

3 x

10-6

Yes

rs11

6461

14

FOXC

2, M

THFS

D

1685

,141

,275

T/A

.868

.894

1.65

8 1.

285-

2.14

0 8.

9 x

10-5

0.00

2Y

esrs

9911

259

PRK

CA

17

62

,085

,377

C/A

.435

.493

1.27

4 1.

134-

1.43

2 4.

4 x

10-5

rs16

9598

80

PRK

CA

17

62,0

85,5

28A

/G.4

35.4

931.

274

1.13

4-1.

432

4.3

x 10

-5

rs80

7711

0 PR

KC

A17

62,0

87,0

49A

/G.4

35.4

931.

274

1.13

4-1.

432

4.3

x 10

-5

rs10

2474

0 PR

KC

A17

62,0

88,1

52C

/G.4

35.4

931.

275

1.13

4-1.

432

4.3

x 10

-5

rs72

0734

5PR

KC

A17

62,0

93,7

47T

/C.7

07.7

551.

307

1.14

4-1.

492

7.5

x 10

-5

rs17

3840

05

181,

565,

020

A/G

.810

.839

1.86

4 1.

409-

2.46

7 1.

1 x

10-5

.10

Yes

rs17

8571

0 18

21,6

12,8

25G

/C.6

48.7

021.

295

1.14

2-1.

468

5.1

x 10

-5

rs72

2965

4 18

35,5

49,9

84A

/G.9

59.9

782.

024

1.41

2-2.

902

8.0

x 10

-5

rs15

9658

3 18

35,5

50,8

93G

/A.9

59.9

792.

033

1.41

8-2.

916

7.3

x 10

-5

rs96

7599

5 18

35,5

74,9

07G

/A.9

59.9

782.

020

1.41

0-2.

895

8.3

x 10

-5

rs10

8534

67

1835

,582

,328

A/G

.959

.978

2.02

1 1.

410-

2.89

6 8.

2 x

10-5

rs61

6444

SETB

P1

1840

,739

,522

A/C

.882

.917

1.46

51.

208-

1.77

89.

0 x

10-5

rs17

5200

22

18

,543

,063

A/G

.494

.555

1.28

2 1.

138-

1.44

5 4.

1 x

10-5

5.

5 x

10-5

Yes

rs43

8798

22

18,5

44,0

53G

/A.4

94.5

551.

282

1.13

8-1.

444

4.2

x 10

-5

rs

5206

98

LOC

1502

0722

19,3

49,4

34G

/A.7

02.7

571.

377

1.19

9-1.

582

5.4

x 10

-6

rs56

5979

22

19,3

53,5

00C

/T.6

79.7

301.

295

1.13

9-1.

472

7.0

x 10

-5Y

esrs

4792

75

2219

,353

,777

T/A

.656

.708

1.28

3 1.

131-

1.45

5 9.

5 x

10-5

rs49

1228

D

KFZ

p434

N03

522

19,3

57,9

25G

/A.6

79.7

301.

294

1.13

8-1.

471

7.5

x 10

-5

rs59

1446

D

KFZ

p434

N03

522

19,3

59,2

04A

/G.6

56.7

081.

283

1.13

1-1.

454

9.7

x 10

-5

rs22

6733

9 C

ACN

G2

2235

,290

,742

G/T

.610

.666

1.33

3 1.

169-

1.52

1 1.

6 x

10-5

4.

5 x

10-6

Yes

74

Tabl

e S4

. Con

firm

ed T

2D su

scep

tibili

ty lo

ci:

expa

nded

FU

SIO

N re

sults

sk

Ri

alle

le R

/ R

isk

alle

le

Non

-ris

k

Con

trols

(n)

Cas

es (n

)

fr

eque

ncy

Add

itive

D

omin

ant

Rec

essi

ve

SNP

Gen

eSt

age

alle

le N

RR

RN

NN

RR

RN

NN

cont

rol

case

OR

95%

CI

p-va

lue

OR

95%

CI

p-va

lue

OR

95%

CI

p-va

lue

rs

1801

282

PPAR

G

1

s5

219

KC

NJ1

11

TC

221

562

346

271

538

296

.445

.4

89

1.20

41.

069-

1.35

7 .0

022

1.21

41.

007-

1.46

3.0

42

1.36

61.

114-

1.67

5.0

027

2

T/C

284

622

328

271

624

295

.482

.4

90

1.03

50.

922-

1.16

2 .5

6 1.

112

0.92

5-1.

338

.26

0.97

90.

807-

1.18

6.8

3

1+2

T/C

505

1184

67

4 54

2 11

6259

1 .4

64

.489

1.

109

1.02

1-1.

204

.014

1.

152

1.01

1-1.

312

.034

1.

142

0.99

4-1.

312

.060

rs

9300

039

1

C/A

929

232

13

992

161

7 .8

90

.925

1.

520

1.23

6-1.

869

6.0

x 10

-51.

797

0.70

2-4.

600

.21

1.56

31.

254-

1.94

86.

2 x

10-5

2 C

/A

98

8 22

7 17

10

0717

0 5

.894

.9

24

1.44

21.

179-

1.76

4 3.

2 x

10-4

3.44

51.

247-

9.52

0.0

094

1.42

71.

150-

1.77

1.0

012

1+

2 C

/A

19

17

459

30

1999

331

12

.892

.9

24

1.47

81.

280-

1.70

5 6.

8 x

10-8

2.47

01.

252-

4.87

4.0

062

1.49

01.

279-

1.73

72.

7 x

10-7

rs80

5013

6 FT

O1

A/C

192

562

420

213

538

410

.403

.4

15

1.03

40.

920-

1.16

2 .5

8 0.

999

0.84

1-1.

186

.99

1.12

40.

904-

1.39

7.2

9

2 A

/C

15

0 58

5 49

2 18

5 56

6 42

7 .3

61

.397

1.

179

1.04

6-1.

329

.007

0 1.

179

0.99

8-1.

394

.053

1.

363

1.07

7-1.

725

.009

8

1+2

A/C

342

1147

91

2 39

8 11

0483

7 .3

81

.406

1.

107

1.01

9-1.

203

.017

1.

091

0.96

9-1.

229

.15

1.24

01.

058-

1.45

3.0

078

C/G

778

336

4583

429

819

.816

.854

1.30

31.

111-

1.52

9.0

011

2.39

91.

387-

4.15

1.0

011

1.27

01.

059-

1.52

3.0

097

2

C/G

840

337

3883

829

337

.830

.843

1.07

70.

924-

1.25

60.

340.

975

0.61

2-1.

555

.92

1.11

00.

929-

1.32

7.2

5

1+2

C/G

16

18

67

383

1672

591

56.8

23.8

481.

195

1.07

1-1.

333

.0

014

1.49

41.

056-

2.11

4

.022

1.20

01.

058-

1.36

2

.004

6 rs

4402

960

IGF2

BP2

1T/

G10

247

158

514

849

549

8.2

91.3

471.

276

1.12

6-1.

446

1.2

x 10

-4

1.31

61.

115-

1.55

5.0

012

1.52

01.

160-

1.99

2.0

022

2

T/G

142

498

595

122

553

515

.317

.335

1.07

30.

951-

1.21

1.2

51.

197

1.01

8-1.

408

.029

0.87

20.

672-

1.13

1.3

0

1+2

T/G

244

96

9 11

8027

0 10

4810

13.3

04

.341

1.

175

1.07

8-1.

281

2.

4 x

10-4

1.

263

1.12

5-1.

418

7.

3 x

10-5

1.

155

0.96

0-1.

390

.1

3 rs

7754

840

CD

KAL

11

C/G

154

522

439

190

531

400

.372

.406

1.15

51.

022-

1.30

4.0

21

1.16

50.

979-

1.38

7.0

84

1.28

81.

019-

1.62

8.0

34

2

C/G

141

574

509

153

565

466

.350

.368

1.08

30.

959-

1.22

3.2

01.

093

0.92

6-1.

290

.29

1.14

10.

890-

1.46

3.3

0

1+2

C/G

295

10

96

94

8 34

3 10

9686

6 .3

60.3

871.

120

1.02

8-1.

220

.0

095

1.12

91.

002-

1.27

1

.046

1.22

01.

030-

1.44

4

.021

rs13

2666

34

SLC

30A8

1

C/T

421

577

176

506

500

155

.604

.651

1.22

21.

084-

1.37

9.0

010

1.15

70.

913-

1.46

6.2

31.

380

1.16

6-1.

634

1.8

x 10

-4

2C

/T47

056

119

250

551

616

0.6

14.6

461.

143

1.01

6-1.

286

.026

1.19

90.

952-

1.51

1.1

21.

190

1.00

8-1.

406

.040

1+

2 C

/T

891

11

38

36

8 10

1110

1631

5 .6

09.6

491.

184

1.08

9-1.

287

6.

8 x

10-5

1.

175

0.99

7-1.

385

.0

53 1.

289

1.14

6-1.

449

2.

3x 1

0-5

rs

1081

1661

C

DK

N2A

/B1

T/C

809

308

1385

025

618

.852

.870

1.16

80.

980-

1.39

2.0

820.

763

0.36

9-1.

576

.46

1.22

31.

011-

1.48

0.0

38

2T/

C89

330

933

911

256

23.8

48.8

731.

223

1.03

9-1.

441

.015

1.34

50.

779-

2.32

2.2

81.

254

1.04

2-1.

510

.017

1+

2T/

C

1702

617

4617

6151

241

.850

.872

1.20

4 1.

069-

1.35

6

.002

2 1.

112

0.72

4-1.

708

.6

31.

245

1.09

1-1.

421

.0

01 rs

1111

875

HH

EX1

C/T

333

568

273

372

549

240

.526

.557

1.12

81.

006-

1.26

6.0

39

1.16

40.

954-

1.42

0.1

31.

187

0.99

2-1.

420

.061

2C

/T33

259

628

533

358

125

0.5

19.5

361.

058

0.94

3-1.

187

.34

1.12

60.

926-

1.36

9.2

31.

039

0.86

6-1.

246

.68

1+

2 C

/T

665

11

64

55

8 70

5 11

3049

0 .5

22.5

461.

097

1.01

2-1.

189

.0

251.

148

0.99

9-1.

318

.0

51

1.

120

0.98

6-1.

271

.0

81 rs

7903

146

TCF7

L21

T/C

3235

678

655

422

684

.179

.229

1.38

81.

197-

1.61

0 1.

3 x

10-5

1.42

21.

198-

1.68

85.

3 x

10-5

1.81

91.

161-

2.85

0.0

079

2

T/C

3338

381

068

393

711

.183

.226

1.29

51.

122-

1.49

5 3.

9 x

10-4

1.26

61.

069-

1.49

8.0

061

2.12

31.

382-

3.26

24.

1 x

10-4

1+2

T/C

65

739

1596

123

815

1395

.181

.227

1.34

31.

213-

1.48

8

1.4

x 10

-81.

344

1.19

2-1.

514

1.

2 x

10-6

1.

993

1.46

4-2.

712

7.

1 x

10-6

r

/

75

Tabl

e S5

. FU

SIO

N st

age

1, st

age2

, and

stag

e 1

+ 2

T2D

ass

ocia

tion

resu

lts fo

r 80

SNPs

. SN

Ps w

ere

sele

cted

for s

tage

1 o

r sta

ge 2

gen

otyp

ing

base

d on

resu

lts in

the

FUSI

ON

GW

A, c

ombi

ned

evid

ence

from

FU

SIO

N, D

GI,

and

WTC

CC

GW

As,

or p

revi

ous r

epor

ts.

Stag

e 1

Stag

e 2

Stag

e 1

+ 2

Ris

k C

ontro

lC

ase

Con

trol

Cas

e C

ontro

lC

ase

alle

le/

risk

risk

risk

risk

risk

risk

Po

sitio

n

no

n-ris

k al

lele

al

lele

al

lele

al

lele

al

lele

al

lele

St

age

1 St

age

2 St

age

1 +

2

Rea

son

for

SNP

Chr

(b

p)

G

enes

al

lele

fr

eq

freq

fr

eq

freq

fr

eq

freq

O

R

95%

CI

p-va

lue

OR

95

% C

I p-

valu

e O

R

95%

CI

p-va

lue

fo

llow

-up

rs64

0742

1

20,7

29,8

60

C

DA,

DD

OST

, K

IF17

, PIN

K1

A/C

.6

01

.663

.6

16

.613

.6

09

.638

1.

297

1.14

7-1.

465

2.9

x 10

-5

0.99

20.

884-

1.11

2.8

9 1.

127

1.03

7-1.

225

.004

7

FUSI

ON

GW

A

rs17

3564

14

1 59

,031

,529

-C

/T

.694

.7

36

.719

.7

08

.707

.7

22

1.24

81.

096-

1.42

2 8.

0 x

10-4

0.

953

0.84

1-1.

081

.46

1.08

40.

991-

1.18

6.0

77

FU

SIO

N Im

pute

d rs

1702

5978

1

110,

781,

653

K

CN

A10

G/A

.9

14

.947

.9

34

.930

.9

24

.939

1.

705

1.34

7-2.

158

6.6

x 10

-6

0.94

10.

752-

1.17

8.6

0 1.

270

1.08

2-1.

491

.003

3

FUSI

ON

GW

A

rs10

4942

17

1 11

9,18

1,23

0

TBX1

5 G

/T

.708

.7

35

.740

.7

25

.724

.7

30

1.14

21.

004-

1.29

8 .0

44

0.92

90.

816-

1.05

8.2

7 1.

026

0.93

7-1.

124

.58

C

ombi

ned

GW

A

rs75

9978

1 2

43,5

90,3

77

PL

EKH

H2,

TH

ADA

T/C

.9

42

.958

.9

54

.950

.9

48

.954

1.

478

1.11

9-1.

953

.005

6 0.

895

0.68

3-1.

172

.42

1.14

70.

947-

1.39

0.1

6

Com

bine

d G

WA

rs

6704

803

2 15

8,17

5,05

9

ACVR

1C, P

SCD

BP

C/T

.9

28

.946

.9

38

.942

.9

33

.944

1.

316

1.03

3-1.

675

.025

1.

084

0.85

1-1.

380

.52

1.19

81.

011-

1.41

9.0

36

C

ombi

ned

GW

A

rs18

0128

2 3

12,3

68,1

25

PP

ARG

, LO

C64

3925

C

/G

.816

.8

54

.830

.8

43

.823

.8

48

1.30

31.

111-

1.52

9 .0

011

1.07

70.

924-

1.25

6.3

4 1.

195

1.07

1-1.

333

.001

4

Com

bine

d G

WA

rs

1708

1352

3

30,3

07,8

51

-

C/A

.9

05

.940

.9

28

.927

.9

17

.933

1.

680

1.33

9-2.

109

5.5

x 10

-6

0.97

80.

780-

1.22

4.8

4 1.

276

1.09

0-1.

494

.002

3

FUSI

ON

Impu

ted

rs13

0721

06

3 13

4,42

5,45

1

BFSP

2, T

MEM

108

T/C

.1

18

.155

.1

43

.142

.1

30

.149

1.

414

1.18

8-1.

682

8.7

x 10

-5

1.00

00.

852-

1.17

4.1

0 1.

166

1.03

8-1.

311

.009

8

FUSI

ON

GW

A

rs46

8729

9 3

186,

595,

361

M

AP3K

13

A/G

.2

25

.276

.2

68

.260

.2

47

.268

1.

325

1.15

8-1.

515

3.9

x 10

-5

0.95

90.

841-

1.09

2.5

3 1.

116

1.01

7-1.

225

.020

FUSI

ON

GW

A

rs17

2899

25

3 18

6,91

7,36

2

C3o

rf65

, IG

F2BP

2,

LOC

6466

00

C/T

.0

18

.022

.0

20

.020

.0

19

.021

1.

181

0.77

5-1.

801

.44

1.07

70.

719-

1.61

3.7

2 1.

117

0.83

6-1.

492

.46

Fo

llow

-up

rs44

0296

0 3

186,

994,

389

IG

F2BP

2 T/

G

.291

.3

47

.317

.3

35

.304

.3

41

1.27

61.

126-

1.44

6 1.

2 x

10-4

1.

073

0.95

1-1.

211

.25

1.17

51.

078-

1.28

12.

4 x

10-4

Com

bine

d G

WA

rs

7343

12

4 6,

421,

426

W

FS1

A/G

.4

78

.506

.4

82

.485

.4

80

.496

1.

101

0.98

0-1.

236

.11

1.01

00.

899-

1.13

4.8

7 1.

056

0.97

3-1.

145

.19

C

ombi

ned

GW

A

rs88

6374

4

7,85

6,44

0

SORC

S2

T/C

.2

11

.270

.2

33

.221

.2

22

.245

1.

385

1.20

9-1.

587

2.4

x 10

-6

0.94

30.

824-

1.08

1.4

0 1.

140

1.03

6-1.

253

.007

FUSI

ON

GW

A

rs13

1392

19

4 42

,294

,231

ATP8

A1

C/A

.7

79

.827

.7

96

.805

.7

88

.816

1.

346

1.16

0-1.

561

7.9

x 10

-5

1.05

20.

911-

1.21

4.5

0 1.

186

1.07

0-1.

314

.001

1

FUSI

ON

GW

A

rs68

3424

8 4

95,4

47,4

56

LO

C64

4429

, PG

DS,

SM

ARC

AD1

T/C

.7

72

.786

.7

79

.765

.7

75

.776

1.

108

0.96

3-1.

275

.15

0.91

90.

800-

1.05

6.2

3 1.

001

0.90

7-1.

104

.99

C

ombi

ned

GW

A

rs27

2046

0 4

104,

412,

290

BD

H2,

CEN

PE,

DH

RS6,

LO

C13

3308

A

/G

.571

.6

07

.574

.5

79

.573

.5

93

1.15

41.

025-

1.29

9 .0

18

1.01

20.

899-

1.14

0.8

4 1.

084

0.99

8-1.

179

.057

Com

bine

d G

WA

rs27

779

5 14

2,23

9,26

7

ARH

GAP

26

A/C

.2

50

.304

.2

59

.269

.2

55

.286

1.

326

1.16

2-1.

513

2.5

x 10

-5

1.04

40.

917-

1.19

0.5

2 1.

171

1.06

8-1.

283

7.5

x 10

-4

FU

SIO

N G

WA

rs

3733

876

5 17

6,31

5,60

1

RAP8

0 G

/A

.765

.8

05

.791

.7

98

.778

.8

01

1.27

71.

109-

1.47

1 6.

6 x

10-4

1.

051

0.90

9-1.

215

.50

1.15

61.

046-

1.27

8.0

046

FU

SIO

N G

WA

rs

4712

523

6 20

,765

,543

CD

KAL

1G

/A

.372

.4

07

.349

.3

66

.360

.3

87

1.16

41.

032-

1.31

2 .0

13

1.08

40.

959-

1.22

4.2

0 1.

123

1.03

2-1.

222

.007

3

Follo

w-u

p rs

1094

6398

6

20,7

69,0

13

C

DK

AL1

C/A

.3

68

.404

.3

47

.364

.3

57

.384

1.

163

1.02

9-1.

315

.016

1.

081

0.95

6-1.

222

.22

1.12

21.

029-

1.22

3.0

087

C

ombi

ned

Impu

ted

rs77

5484

0 6

20,7

69,2

29

C

DK

AL1

C/G

.3

72

.406

.3

50

.368

.3

60

.387

1.

155

1.02

2-1.

304

.021

1.

083

0.95

9-1.

223

.20

1.12

01.

028-

1.22

0.0

095

Fo

llow

-up

rs22

0673

4 6

20,8

02,8

63

C

DK

AL1

T/C

.1

74

.200

.1

68

.174

.1

71

.187

1.

182

1.01

6-1.

375

.030

1.

060

0.91

1-1.

234

.45

1.11

61.

003-

1.24

1.0

43

C

ombi

ned

GW

A

rs44

9678

0 6

21,1

87,6

27

C

DK

AL1

G/T

.1

04

.093

.0

92

.106

.0

98

.100

0.

890

0.73

0-1.

086

.25

1.20

90.

994-

1.47

1.0

57

1.04

60.

911-

1.20

0.5

3

Follo

w-u

p rs

9271

366

6 32

,694

,832

HLA

DQ

A1,

HLA

DRA

, HLA

DRB

1 A

/G

.858

.8

62

.857

.8

67

.858

.8

64

1.04

40.

878-

1.24

1 .6

3 1.

104

0.93

6-1.

303

.24

1.06

70.

948-

1.20

2.2

8

Com

bine

d G

WA

rs11

7514

69

6 33

,912

,525

-C

/T

.563

.6

09

.574

.5

85

.568

.5

97

1.20

91.

073-

1.36

2 .0

018

1.05

00.

933-

1.18

2.4

1 1.

122

1.03

2-1.

219

.007

Com

bine

d G

WA

rs

7750

445

6 37

,872

,955

ZFAN

D3

G/C

.1

36

.180

.1

63

.135

.1

50

.157

1.

407

1.19

4-1.

659

4.2

x 10

-5

0.81

40.

694-

0.95

6.0

12

1.05

30.

941-

1.17

9.3

7

FUSI

ON

Impu

ted

rs94

7213

8 6

43,9

19,7

40

-

T/C

.3

10

.314

.3

05

.321

.3

08

.318

1.

031

0.91

1-1.

166

.63

1.07

10.

946-

1.21

2.2

8 1.

050

0.96

3-1.

145

.27

N

ew A

ssoc

rs

7450

789

6 11

1,92

3,66

8

LOC

6437

49, R

EV3L

, TR

AF3I

P2

T/G

.9

03

.919

.9

08

.912

.9

06

.916

1.

228

1.00

1-1.

506

.048

1.

069

0.87

7-1.

304

.51

1.14

10.

990-

1.31

4.0

68

C

ombi

ned

GW

A

rs20

2196

6 6

132,

192,

132

EN

PP1

A/G

.5

76

.630

.6

06

.621

.5

92

.626

1.

246

1.10

7-1.

403

2.6

x 10

-4

1.05

70.

939-

1.19

0.3

6 1.

148

1.05

6-1.

247

.001

2

FUSI

ON

Impu

ted

rs61

5545

7

18,1

65,1

11

-

C/T

.6

94

.751

.7

08

.733

.7

01

.742

1.

361

1.19

0-1.

556

5.9

x 10

-6

1.13

40.

998-

1.28

9.0

53

1.23

61.

127-

1.35

56.

1 x

10-6

FUSI

ON

GW

A

rs10

2813

05

7 54

,664

,618

-G

/T

.735

.7

72

.738

.7

57

.737

.7

65

1.22

41.

069-

1.40

1 .0

033

1.10

10.

961-

1.26

1.1

6 1.

153

1.04

8-1.

268

0.00

33

C

ombi

ned

GW

A

rs17

1586

86

7 83

,439

,407

SEM

A3A

T/G

.9

51

.957

.9

59

.958

.9

55

.958

1.

156

0.87

4-1.

528

.31

1.00

70.

751-

1.35

1.9

6 1.

077

0.88

1-1.

316

.47

C

ombi

ned

GW

A

rs24

7098

4 7

122,

368,

680

SL

C13A

1 A

/C

.297

.3

48

.316

.2

98

.307

.3

23

1.27

91.

130-

1.44

8 9.

0 x

10-5

0.

930

0.82

2-1.

054

.26

1.08

30.

993-

1.18

1.0

73

FU

SIO

N G

WA

rs

1095

4654

7

138,

816,

342

-

C/T

.7

25

.776

.7

35

.749

.7

30

.762

1.

337

1.16

6-1.

533

2.8

x 10

-5

1.08

90.

952-

1.24

5.2

1 1.

201

1.09

2-1.

321

1.6

x 10

-4

FU

SIO

N G

WA

rs

5579

62

7 14

0,23

2,92

4

LOC

6424

21,

MRP

S33

T/C

.0

47

.076

.0

59

.058

.0

53

.067

1.

650

1.28

7-2.

115

5.9

x 10

-5

0.98

20.

770-

1.25

3.8

9 1.

275

1.07

5-1.

514

.005

2

FUSI

ON

GW

A

rs13

2666

34

8 11

8,25

3,96

4

SLC3

0A8

C/T

.6

04

.651

.6

14

.646

.6

09

.649

1.

222

1.08

4-1.

379

.001

1.

143

1.01

6-1.

286

.026

1.

184

1.08

9-1.

287

6.8

x 10

-5

FU

SIO

N G

WA

rs

7839

244

8 14

2,45

7,43

7

GPR

20

A/G

.0

66

.098

.0

82

.080

.0

74

.089

1.

553

1.24

8-1.

932

6.8

x 10

-5

0.96

70.

784-

1.19

2.7

5 1.

212

1.04

4-1.

407

.012

FUSI

ON

GW

A

rs10

6319

2 9

21,9

93,3

67

C

DK

N2A

, CD

KN

2B

A/G

.5

56

.582

.5

87

.584

.5

72

.583

1.

094

0.97

5-1.

228

.13

0.98

90.

879-

1.11

4.8

5 1.

045

0.96

3-1.

134

.29

Fo

llow

-up

rs56

4398

9

22,0

19,5

47

C

DK

N2A

, CD

KN

2B

T/C

.5

66

.596

.5

96

.590

.5

82

.593

1.

118

0.99

4-1.

258

.064

0.

970

0.86

3-1.

091

.61

1.04

50.

962-

1.13

5.3

0

Follo

w-u

p rs

2383

208

9 22

,122

,076

-A

/G

.842

.8

62

.836

.8

64

.839

.8

63

1.18

41.

002-

1.40

0 .0

47

1.24

01.

057-

1.45

6.0

082

1.21

91.

086-

1.36

77.

2 x

10-4

Com

bine

d G

WA

rs

1081

1661

9

22,1

24,0

94

-

T/C

.8

52

.870

.8

48

.873

.8

50

.872

1.

168

0.98

0-1.

392

.082

1.

223

1.03

9-1.

441

.015

1.

204

1.06

9-1.

356

.002

2

Follo

w-u

p rs

1329

7268

9

91,2

67,6

96

N

FIL3

G/A

.9

24

.952

.9

45

.949

.9

35

.950

1.

650

1.28

0-2.

128

9.0

x 10

-5

1.09

40.

848-

1.41

3.4

9 1.

353

1.13

2-1.

618

8.3

x 10

-4

FU

SIO

N Im

pute

d rs

2185

935

9 11

4,58

1,79

6

-C

/T

.667

.6

75

.661

.6

62

.664

.6

69

1.02

40.

904-

1.16

0 .7

1 1.

008

0.89

5-1.

136

.89

1.01

80.

935-

1.11

0.6

8

Com

bine

d G

WA

rs

1416

904

9 13

1,36

3,87

1

KIA

A051

5, P

OM

T1,

UC

K1

T/C

.9

31

.952

.9

25

.935

.9

28

.943

1.

479

1.15

0-1.

902

.002

1 1.

116

0.89

2-1.

397

.34

1.26

91.

074-

1.49

8.0

049

C

ombi

ned

GW

A

rs12

7087

4 10

29

,879

,870

SVIL

C

/A

.753

.7

99

.780

.7

77

.767

.7

88

1.29

71.

123-

1.49

8 3.

9 x

10-4

0.

976

0.84

9-1.

120

.72

1.11

81.

012-

1.23

4.0

28

FU

SIO

N Im

pute

d rs

9422

546

10

43,3

91,5

05

ZN

F239

, ZN

F485

G

/T

.628

.6

31

.640

.6

51

.634

.6

41

1.00

90.

894-

1.13

8 .8

9 1.

066

0.94

5-1.

203

.30

1.03

60.

951-

1.12

7.4

2

Com

bine

d G

WA

rs

1308

8 10

49

,985

,899

C10

orf7

2 G

/A

.369

.3

98

.363

.3

84

.366

.3

91

1.13

21.

003-

1.27

7 .0

44

1.07

30.

953-

1.20

7.2

4 1.

102

1.01

3-1.

198

.024

Com

bine

d G

WA

rs

1359

624

10

91,3

85,4

08

FL

J372

01,

MPH

OSP

H1,

PAN

K1

C/T

.2

47

.290

.2

68

.265

.2

58

.277

1.

222

1.07

2-1.

394

.002

7 0.

973

0.85

3-1.

110

.68

1.10

81.

010-

1.21

5.0

30

FU

SIO

N G

WA

76

Tabl

e S5

. FU

SIO

N st

age

1, st

age2

, and

stag

e 1

+ 2

T2D

ass

ocia

tion

resu

lts fo

r 80

SNPs

(con

tinue

d)

Stag

e 1

Stag

e 2

Stag

e 1

+ 2

Ris

k C

ontro

lC

ase

Con

trol

Cas

e C

ontro

lC

ase

alle

le/

risk

risk

risk

risk

risk

risk

Po

sitio

n

no

n-ris

k al

lele

al

lele

al

lele

al

lele

al

lele

al

lele

St

age

1 St

age

2 St

age

1 +

2

Rea

son

for

SNP

Chr

(b

p)

G

enes

al

lele

fr

eq

freq

fr

eq

freq

fr

eq

freq

O

R

95%

CI

p-va

lue

OR

95

% C

I p-

valu

e O

R

95%

CI

p-va

lue

fo

llow

-up

rs11

1187

5 10

94

,452

,862

HH

EXC

/T

.526

.5

57

.519

.5

36

.522

.5

46

1.12

81.

006-

1.26

6 .0

39

1.05

80.

943-

1.18

7.3

5 1.

097

1.01

2-1.

189

.025

New

Ass

oc

rs79

2383

7 10

94

,471

,897

-G

/A

.603

.6

31

.591

.6

13

.597

.6

22

1.12

20.

997-

1.26

3 .0

57

1.09

00.

970-

1.22

6.1

5 1.

107

1.01

9-1.

203

.016

Com

bine

d G

WA

/ N

ew A

ssoc

rs

4506

565

10

114,

746,

031

TC

F7L2

T/A

.2

14

.250

.2

17

.248

.2

16

.249

1.

257

1.08

9-1.

450

.001

7 1.

187

1.03

7-1.

360

.013

1.

221

1.10

7-1.

346

6.4

x 10

-5

FU

SIO

N Im

pute

d/

Prev

Ass

oc

rs79

0314

6 10

11

4,74

8,33

9

TCF7

L2T/

C

.179

.2

29

.183

.2

26

.181

.2

27

1.38

81.

197-

1.61

0 1.

3 x

10-5

1.

295

1.12

2-1.

495

3.9

x 10

-4

1.34

31.

213-

1.48

81.

4 x

10-8

FUSI

ON

GW

A/

Prev

Ass

oc

rs12

2553

72

10

114,

798,

892

TC

F7L2

T/G

.1

56

.203

.1

65

.199

.1

61

.201

1.

400

1.20

1-1.

632

1.5

x 10

-5

1.24

41.

070-

1.44

7.0

044

1.31

81.

184-

1.46

73.

6 x

10-7

FUSI

ON

GW

A/

Prev

Ass

oc

rs52

19

11

17,3

66,1

48

AB

CC

8, K

CN

J11

T/C

.4

45

.489

.4

82

.490

.4

64

.489

1.

204

1.06

9-1.

357

.002

2 1.

035

0.92

2-1.

162

.56

1.10

91.

021-

1.20

4.0

14

C

ombi

ned

Impu

ted/

Pr

ev A

ssoc

rs

9300

039

11

41,8

71,9

42

-

C/A

.8

90

.925

.8

94

.924

.8

92

.924

1.

520

1.23

6-1.

869

6.0

x 10

-5

1.44

21.

179-

1.76

43.

2 x

10-4

1.

478

1.28

0-1.

705

6.8

x 10

-8

FU

SIO

N G

WA

rs

1103

6627

11

41

,881

,290

-C

/A

.912

.9

46

.924

.9

46

.918

.9

46

1.66

51.

313-

2.11

0 1.

9 x

10-5

1.

466

1.15

9-1.

856

.001

3 1.

563

1.32

4-1.

846

9.2

x 10

-8

FU

SIO

N Im

pute

d rs

1083

7766

11

41

,984

,377

-T/

C

.827

.8

69

.846

.8

70

.836

.8

70

1.39

71.

181-

1.65

2 8.

6 x

10-5

1.

252

1.05

8-1.

482

.008

8 1.

313

1.16

6-1.

477

5.8

x 10

-6

FU

SIO

N Im

pute

d rs

7480

010

11

42,2

03,2

94

LO

C38

7761

G

/A

.174

.1

74

.162

.1

71

.168

.1

72

1.00

40.

863-

1.16

9 .9

6 1.

078

0.92

5-1.

257

.333

1.

034

0.92

9-1.

151

.54

N

ew A

ssoc

rs

4379

834

11

44,1

15,0

14

AL

X4, E

XT2,

PH

ACS

G/A

.3

16

.316

.2

95

.306

.3

05

.311

0.

980

0.86

5-1.

111

.76

1.06

30.

936-

1.20

7.3

5 1.

027

0.94

0-1.

123

.55

N

ew A

ssoc

rs

1161

6188

12

6,

373,

003

LT

BR, S

CN

N1A

A

/G

.426

.4

84

.445

.4

55

.436

.4

70

1.27

01.

131-

1.42

6 4.

8 x

10-5

1.

040

0.92

7-1.

167

.50

1.14

81.

059-

1.24

48.

3 x

10-4

FUSI

ON

Impu

ted

rs37

5126

2 12

12

,509

,957

DU

SP16

, LO

H12

CR1

G

/A

.914

.9

32

.917

.9

04

.916

.9

18

1.29

81.

038-

1.62

3 .0

22

0.85

30.

698-

1.04

3.1

2 1.

039

0.89

6-1.

205

.61

C

ombi

ned

GW

A

rs11

5318

8 12

53

,385

,263

-A

/T

.699

.7

21

.682

.7

02

.690

.7

11

1.10

00.

966-

1.25

1 .1

5 1.

118

0.98

9-1.

266

.075

1.

109

1.01

5-1.

212

.022

Com

bine

d Im

pute

d rs

7132

840

12

69,6

97,8

28

-

T/G

.4

25

.442

.4

26

.438

.4

25

.440

1.

070

0.94

9-1.

205

.27

1.06

50.

951-

1.19

3.2

7 1.

063

0.97

9-1.

153

.14

C

ombi

ned

Impu

ted

rs38

2525

3 12

10

7,61

1,74

7

CO

RO1C

, DAO

, SS

H1

A/G

.9

73

.989

.9

87

.986

.9

08

.988

2.

575

1.60

4-4.

134

3.6

x 10

-5

0.99

10.

602-

1.63

1.9

7 1.

678

1.20

4-2.

337

.001

9

FUSI

ON

GW

A

rs23

0045

5 12

10

8,08

6,23

6

ACAC

BG

/A

.815

.8

39

.821

.8

20

.818

.8

29

1.16

60.

999-

1.36

1 .0

51

0.99

70.

857-

1.16

1.9

7 1.

075

0.96

5-1.

197

.19

C

ombi

ned

GW

A

rs47

6765

8 12

11

6,98

2,16

1

FLJ2

0674

, WSB

2 T/

C

.577

.6

33

.609

.6

13

.593

.6

23

1.27

41.

134-

1.43

0 4.

1 x

10-5

1.

025

0.91

2-1.

151

.68

1.13

41.

045-

1.23

0.0

025

FU

SIO

N G

WA

rs

1033

594

14

36,2

81,3

17

SL

C25A

21

C/T

.4

79

.502

.4

96

.507

.4

87

.505

1.

069

0.95

1-1.

202

.26

1.04

90.

933-

1.17

8.4

2 1.

067

0.98

2-1.

158

.13

C

ombi

ned

GW

A

rs14

4972

5 14

38

,246

,572

-C

/T

.540

.6

07

.584

.5

95

.562

.6

00

1.31

51.

163-

1.48

6 1.

1 x

10-5

1.

063

0.94

3-1.

197

.32

1.18

01.

084-

1.28

41.

3 x

10-4

FUSI

ON

Impu

ted

rs22

6897

4 14

68

,492

,917

ACTN

1 G

/A

.231

.2

42

.221

.2

21

.226

.2

31

1.05

80.

920-

1.21

6 .4

3 0.

990

0.86

3-1.

136

.89

1.02

00.

926-

1.12

4.6

9

Com

bine

d Im

pute

d rs

1291

0827

15

56

,417

,311

-T/

G

.021

.0

45

.029

.0

32

.025

.0

39

2.19

51.

541-

3.12

7 6.

3 x

10-6

1.

109

0.80

0-1.

539

.53

1.55

91.

232-

1.97

21.

8 x

10-4

FUSI

ON

Impu

ted

rs10

5210

95

16

13,5

28,9

36

-

A/G

.2

06

.256

.2

28

.229

.2

17

.243

1.

351

1.17

4-1.

554

2.3

x 10

-5

1.00

80.

882-

1.15

3.9

0 1.

157

1.05

1-1.

274

.002

8

FUSI

ON

GW

A

rs80

5013

6 16

52

,373

,776

FTO

A/C

.4

03

.415

.3

61

.397

.3

81

.406

1.

034

0.92

0-1.

162

.58

1.17

91.

046-

1.32

9.0

070

1.10

71.

019-

1.20

3.0

17

C

ombi

ned

GW

A

rs18

0077

4 16

55

,573

,046

CET

PC

/T

.667

.7

26

.705

.6

99

.687

.7

12

1.34

81.

182-

1.53

7 7.

3 x

10-6

0.

967

0.85

1-1.

098

.60

1.13

81.

040-

1.24

6.0

05

FU

SIO

N Im

pute

d rs

1164

6114

16

85

,141

,275

FLJ1

2998

, FO

XC2,

M

THFS

D

T/A

.8

95

.921

.9

15

.905

.9

05

.913

1.

382

1.12

4-1.

698

.002

0.

892

0.72

8-1.

092

.27

1.11

00.

962-

1.28

1.1

5

FUSI

ON

Impu

ted

rs72

2230

8 17

25

,301

,167

CC

DC

55, E

FCAB

5,

FLJ4

6247

, SLC

6A4,

SS

H2

T/C

.5

32

.553

.5

35

.552

.5

33

.553

1.

094

0.97

3-1.

229

.13

1.07

50.

958-

1.20

6.2

2 1.

086

1.00

1-1.

179

.047

Com

bine

d G

WA

rs17

3840

05

18

1,56

5,02

0

-A

/G

.842

.8

59

.858

.8

59

.851

.8

59

1.14

70.

974-

1.35

1 .1

0 1.

004

0.85

0-1.

186

.96

1.07

40.

956-

1.20

6.2

3

FUSI

ON

Impu

ted

rs17

5200

22

18

,543

,063

-A

/G

.490

.5

52

.538

.5

53

.515

.5

53

1.28

51.

137-

1.45

2 5.

5 x

10-5

1.

069

0.95

4-1.

198

.25

1.16

51.

072-

1.26

52.

9 x

10-4

FUSI

ON

Impu

ted

rs56

5979

22

19

,353

,500

DK

FZp4

34N

035,

LO

C15

0207

, LO

C64

5289

, PI

K4C

A, S

ERPI

ND

1

C/T

.6

79

.730

.7

27

.709

.7

03

.720

1.

295

1.13

9-1.

472

7.0

x 10

-5

0.92

90.

816-

1.05

6.2

6 1.

090

0.99

6-1.

193

.060

FUSI

ON

GW

A

rs22

6733

9 22

35

,290

,742

CAC

NG

2G

/T

.611

.6

74

.630

.6

18

.621

.6

46

1.34

11.

182-

1.52

1 4.

5 x

10-6

0.

939

0.83

2-1.

060

.31

1.11

21.

020-

1.21

3.0

16

FU

SIO

N Im

pute

d

77

31

Tabl

e S6

: C

ompa

rison

of T

2D a

ssoc

iatio

n re

sults

for S

NPs

that

wer

e im

pute

d w

ith a

p-v

alue

< .0

01 a

nd th

en g

enot

yped

in th

e FU

SIO

N st

age

1 sa

mpl

e

R

isk

alle

le fr

eque

ncy

in c

ontro

ls

F

USI

ON

Sta

ge 1

Im

pute

da

FU

SIO

N S

tage

1

Gen

otyp

ed

Im

puta

tion

qual

ity

mea

sure

s

SNP

Gen

es

Impu

ted

Gen

otyp

ed

p-

valu

ea O

Ra

p-

valu

e O

R

Im

puta

tion

cons

iste

ncyc

Estim

ated

r2

d

O

bser

ved

alle

lic

conc

orda

nce

Max

imum

r2 w

ith S

NPs

us

ed fo

r im

puta

tion

rs12

9108

27

.024

.0

21

2.

5 x

10-6

2.

57

6.

3 x

10-6

2.

20

.9

77

.720

.994

.3

9 rs

1449

725

.544

.5

40

5.

3 x

10-6

1.

33

1.

1 x

10-5

1.

31

.9

89

.977

.990

.9

0 rs

1708

1352

.9

09

.905

7.3

x 10

-6

1.70

5.5

x 10

-6

1.68

.994

.9

54

1.

000

.87

rs11

6161

88

SCN

N1A

/LTB

R .4

74

.426

1.5

x 10

-5

1.40

4.8

x 10

-5

1.27

.760

.5

85

.9

19

.27

rs10

8377

66

.840

.8

27

1.

5 x

10-5

1.

49

8.

6 x

10-5

1.

40

.9

75

.930

.975

.4

6 rs

1103

6627

.9

03

.912

1.7

x 10

-5

1.67

1.9

x 10

-5

1.66

.976

.9

01

.9

87

.75

rs17

3840

05

.811

.8

42

1.

9 x

10-5

1.

84

.1

0 1.

15

.7

43

.309

.874

.1

1 rs

7750

445

.116

.1

36

2.

0 x

10-5

1.

47

4.

1 x

10-5

1.

41

.9

86

.965

.977

.5

0 rs

2267

339

CAC

NG

2 .6

13

.611

2.8

x 10

-5

1.33

4.5

x 10

-6

1.34

.939

.8

73

.9

90

.72

rs17

3564

14

.551

.6

94

3.

0 x

10-5

1.

30

8.

0 x

10-4

1.

25

.9

44

.920

.878

.3

4 rs

1800

774

CET

P.6

42

.667

3.9

x 10

-5

1.39

7.3

x 10

-6

1.35

.810

.6

17

.9

72

.29

rs17

5200

.4

93

.490

6.6

x 10

-5

1.28

5.5

x 10

-5

1.28

.993

.9

76

.9

97

.85

rs61

0371

6 .3

42

.342

7.3

x 10

-5

1.28

4.8

x 10

-5

1.29

.993

.9

78

.9

99

.33

rs13

2972

68

NFI

L3

.928

.9

24

7.

5 x

10-5

1.

72

9.

0 x

10-5

1.

65

.9

88

.916

.998

.2

8 rs

1164

6114

FO

XC2/

FLJ1

2998

.8

68

.895

9.1

x 10

-5

1.66

.002

0 1.

38

.8

60

.512

.956

.1

3 rs

2021

966

ENPP

1.5

84

.576

9.1

x 10

-5

1.32

2.6

x 10

-4

1.25

.846

.7

69

.9

37

.46

rs12

7087

4 SV

IL

.745

.7

53

1.

4 x

10-4

1.

33

3.

9 x

10-4

1.

30

.9

83

.954

.988

.2

4 rs

4812

831

.150

.1

16

1.

6 x

10-4

1.

53

.0

055

1.28

.831

.5

16

.9

44

.45

rs44

0296

0 IG

F2BP

2 .2

90

.291

1.7

x 10

-4

1.27

1.2

x 10

-4

1.28

.997

1.

026

.9

98

1.00

rs

2466

291

SLC

30A8

.3

99

.361

6.3

x 10

-4

1.26

.001

6 1.

22

.8

74

.830

.935

.4

7 rs

1801

282

PPAR

G.8

16

.816

9.5

x 10

-4

1.31

.001

1 1.

30

.9

99

1.00

2

1.00

0 1.

00

rs38

0217

7 SL

C30

A8

.604

.6

05

9.

9 x

10-4

1.

23

.0

012

1.22

.999

1.

015

.9

99

1.00

rs

4506

565

TCF7

L2

.213

.2

14

.0

015b

1.26

.001

7 1.

26

.9

99

.965

1.00

0 .9

2 a Im

puta

tion-

base

d an

alys

is re

stric

ted

to in

divi

dual

s with

succ

essf

ul g

enot

ypes

for t

he sa

me

SNP;

thes

e re

sults

may

diff

er fr

om th

e im

pute

d re

sults

in

Tabl

e S2

whi

ch a

re b

ased

on

all s

tage

1 in

divi

dual

s b Im

pute

d p-

valu

e =

7.0

x 10

-4 in

stag

e 1

sam

ple

c Impu

tatio

n co

nsis

tenc

y is

the

prop

ortio

n of

impu

tatio

n ite

ratio

ns th

at a

gree

d w

ith th

e m

ost l

ikel

y ge

noty

pe

d The

estim

ated

r2 is th

e ra

tio o

f obs

erve

d va

rianc

e of

dos

age

scor

es a

cros

s sam

ples

to th

e ex

pect

ed v

aria

nce

give

n th

e im

pute

d SN

P al

lele

freq

uenc

y

78

Table S7. SNP annotation weights used in SNP picking for stage 2 genotyping Annotation Weight Maximum of: Frameshift 50 Stop codon 50 Critical splice site 50 Poly A signal 30 Any change to initial ATG signal 30 Non-synonymous coding: Identical amino acid seen in more than 75% of mammals 20 Similar amino acid seen in more than 75% of mammals 20 Non-conservative amino acid change 6 to 9a Other non-synonymous 5 SNP in exon, includes 5 and 3 UTRs 2 Bonus: FUSION linkage LOD>1 1 to 3b SNP near candidate gene 1.5 SNP near gene over-expressed in tissue of interest 1.5 Conserved 1.2 Near any gene 1.2 a For non-conservative amino acid changes, the weight is 5 - x, where -4 < x < -1 is the

BLOSUM62 score for the amino acid substitution (23) b For linkage, the weight is the T2D LOD score in the FUSION 1+2 families (2) if that LOD

score is >1

79

Supplemental Online Material References

1. T. Valle et al., Diabetes Care 21, 949 (1998). 2. K. Silander et al., Diabetes 53, 821 (2004). 3. T. Saaristo et al., Diab Vasc Dis Res 2, 67 (2005). 4. Geneva, World Health Organization (1999). 5. M. Peltonen et al., Suomen Lääkäril (Finnish Med J) 61, 163 (2005). 6. A. Aromaa, S. Koskinen, Publications of the National Public Health Institute, Helsinki,

Finland (2004). 7. J. Tuomilehto et al., Int J Epidemiol 20, 1010 (1991). 8. J. Saramies, Acta Univ. Oul., D 812 (2004). 9. M. I. Hawa, T. O. Ola, A. Gigante, J. Teng, R. D. G. Leslie, Diabetologia 49, 182 (2006). 10. K. L. Gunderson et al., Methods Enzymol 410, 359 (2006). 11. M. Barnhart et al., American Society of Human Genetics A242 (2006). 12. M. P. Epstein, W. L. Duren, M. Boehnke, Am J Hum Genet 67, 1219 (2000). 13. J. E. Wigginton, G. R. Abecasis, Bioinformatics 21, 3445 (2005). 14. A. Agresti, Categorical Data Analysis (John Wiley & Sons, ed. 2nd, 2002), pp. 710. 15. International HapMap Consortium, Nature 437, 1299 (2005). 16. S. Purcell, S. S. Cherny, P. C. Sham, Bioinformatics 19, 149 (2003). 17. N. Li, M. Stephens, Genetics 165, 2213 (2003). 18. Y. Li, P. Scheet, J. Ding, G. R. Abecasis, (Submitted for publication; manuscript

available from GRA). 19. C. J. Willer et al., Genet Epidemiol 30, 180 (2006). 20. L. V. Hedges, Psychol Bull 92, 490 (1982). 21. K. Roeder, S. A. Bacanu, L. Wasserman, B. Devlin, Am J Hum Genet 78, 243 (2006). 22. Arch Intern Med 161, 397 (2001). 23. S. Henikoff, J. G. Henikoff, Proc Natl Acad Sci U S A 89, 10915 (1992).

80

Chapter 4

Variations in the G6PC2/ABCB11 genomic region are associated with fasting glucose levels

Journal of Clinical Investigation2008;118(7):2620-2628

81

Wei-Min Chen,1,2 Michael R. Erdos,3 Anne U. Jackson,4 Richa Saxena,5 Serena Sanna,4,6

Kristi D. Silver,7 Nicholas J. Timpson,8 Torben Hansen,9 Marco Orrù,6 Maria Grazia Piras,6

Lori L. Bonnycastle,3 Cristen J. Willer,4 Valeriya Lyssenko,10 Haiqing Shen,7 Johanna Kuusisto,11

Shah Ebrahim,12 Natascia Sestu,13 William L. Duren,4 Maria Cristina Spada,6

Heather M. Stringham,4 Laura J. Scott,4 Nazario Olla,6 Amy J. Swift,3 Samer Najjar,13

Braxton D. Mitchell,7 Debbie A. Lawlor,8 George Davey Smith,8 Yoav Ben-Shlomo,14

Gitte Andersen,9 Knut Borch-Johnsen,9,15,16 Torben Jørgensen,15 Jouko Saramies,17 Timo T. Valle,18

Thomas A. Buchanan,19,20 Alan R. Shuldiner,7 Edward Lakatta,13 Richard N. Bergman,20

Manuela Uda,6 Jaakko Tuomilehto,18,21 Oluf Pedersen,9,16 Antonio Cao,6 Leif Groop,10

Karen L. Mohlke,22 Markku Laakso,11 David Schlessinger,13 Francis S. Collins,3 David Altshuler,5

Gonçalo R. Abecasis,4 Michael Boehnke,4 Angelo Scuteri,23,24 and Richard M. Watanabe20,25

1Department of Public Health Sciences and 2Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA. 3Genome Technology Branch, National Human Genome Research Institute, Bethesda, Maryland, USA.

4Center for Statistical Genetics and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA. 5Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

6Istituto di Neurogenetica e Neurofarmacologia, Consiglio Nazionale delle Ricerche, Cagliari, Italy. 7Division of Endocrinology, Diabetes and Nutrition, University of Maryland School of Medicine, Baltimore, Maryland, USA. 8MRC Centre for Causal Analyses in Translational Epidemiology,

Department of Social Medicine, University of Bristol, Bristol, United Kingdom. 9Steno Diabetes Center, Gentofte, Denmark. 10Department of Clinical Sciences, Diabetes and Endocrinology, Lund University, University Hospital Malmö, Malmö, Sweden.

11Department of Medicine, University of Kuopio and Kuopio University Hospital, Kuopio, Finland. 12Department of Epidemiology and Population Health, Non-communicable Disease Epidemiology Unit, London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom. 13Gerontology Research Center, National Institute on Aging, Baltimore, Maryland, USA. 14Social Medicine Department, University of Bristol, Bristol,

United Kingdom. 15Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark. 16Faculty of Health Sciences, University of Aarhus, Aarhus, Denmark. 17Savitaipale Health Center, Savitaipale, Finland. 18Diabetes Unit,

Department of Health Promotion and Chronic Disease Prevention, National Public Health Institute, and Department of Public Health, University of Helsinki, Helsinki, Finland. 19Department of Medicine, Division of Endocrinology, and 20Department of Physiology and Biophysics, Keck School of Medicine, University of Southern California, Los Angeles, California, USA. 21South Ostrobothnia Central Hospital, Senäjoki, Finland. 22Department of Genetics,

University of North Carolina, Chapel Hill, North Carolina, USA. 23Laboratory of Cardiovascular Science, National Institute on Aging, NIH, Baltimore, Maryland, USA. 24Unità Operativa Geriatria, Istituto Nazionale Ricovero E Cura Anziari, Rome, Italy.

25Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, USA.

Glucose is the major source of energy in humans, with levels in vivo determined by a balance of glucose absorption via the gut, production primarily by the liver, and utilization by both insulin-sensitive and insulin-insensitive tissues (1, 2). Homeostatic control of glucose levels involves complex interactions between humoral and neural mechanisms that work in concert to regulate tightly the balance between production and utilization to maintain a nor-

Nonstandard abbreviations used: ABCB11, ATP-binding cassette, subfamily B (MDR/TAP), member 11; BWHHS, British Women’s Heart and Health Study; DGI, Diabetes Genetics Initiative; FUSION, Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics; G6PC2, glucose-6-phosphatase catalytic subunit 2; GWA, genome-wide association; LD, linkage disequilibrium; METSIM, METabolic Syndrome in Men; T2DM, type 2 diabetes mellitus.

Conflict of interest: The authors have declared that no conflict of interest exists.

Citation for this article: J. Clin. Invest. 118:2620–2628 (2008). doi:10.1172/JCI34566.

Identifying the genetic variants that regulate fasting glucose concentrations may further our understanding of the pathogenesis of diabetes. We therefore investigated the association of fasting glucose levels with SNPs in 2 genome-wide scans including a total of 5,088 nondiabetic individuals from Finland and Sardinia. We found a significant association between the SNP rs563694 and fasting glucose concentrations (P = 3.5 × 10–7). This association was fur-ther investigated in an additional 18,436 nondiabetic individuals of mixed European descent from 7 different stud-ies. The combined P value for association in these follow-up samples was 6.9 × 10–26, and combining results from all studies resulted in an overall P value for association of 6.4 × 10–33. Across these studies, fasting glucose concentra-tions increased 0.01–0.16 mM with each copy of the major allele, accounting for approximately 1% of the total varia-tion in fasting glucose. The rs563694 SNP is located between the genes glucose-6-phosphatase catalytic subunit 2 (G6PC2) and ATP-binding cassette, subfamily B (MDR/TAP), member 11 (ABCB11). Our results in combination with data reported in the literature suggest that G6PC2, a glucose-6-phosphatase almost exclusively expressed in pancreatic islet cells, may underlie variation in fasting glucose, though it is possible that ABCB11, which is expressed primarily in liver, may also contribute to such variation.

83

mal fasting glucose. Elevations in blood glucose are diagnostic of diabetes. Type 2 diabetes mellitus (T2DM) afflicts more than 171 million worldwide and is a leading cause of kidney failure, blind-ness, and lower limb amputations (3–5). Even more modest eleva-tions in glucose concentration (so-called prediabetes) are associated with cardiovascular disease and accelerated atherosclerosis (6). In individuals progressing toward future T2DM, the fasting glucose concentration appears to change only modestly over time until the advent of β cell dysfunction, at which point the glucose concen-tration increases rapidly (7, 8). Many studies have shown that the lowering of glucose levels in individuals with diabetes can prevent or delay diabetes-related complications, providing further evidence for the damaging effects of chronic glucose elevations.

Both genetic and environmental factors contribute to the patho-physiology of T2DM (9–11). The contributions of environmental exposures to T2DM risk are best illustrated by results from the Dia-betes Prevention Program (11) and the Finnish Diabetes Prevention Study (12), in which T2DM incidence was significantly reduced by intensive lifestyle modification. However, the contribution of genetic factors to T2DM risk is not as well understood. Recent genome-wide association (GWA) studies have identified 16 novel T2DM susceptibility loci (13–18), generating new insights into the genetic architecture underlying T2DM. In contrast to disease sta-tus, even less is known about genetic variation that alters specific T2DM-related quantitative traits such as glucose and insulin con-centrations. As seen for T2DM, identification of genetic variants associated with T2DM-related quantitative traits is likely to require large sample sizes due to relatively small gene effect sizes. Fasting glucose concentrations have been shown to be heritable, with nar-row-sense heritability estimates ranging from 25% to 40% (19–24). Given the central role of glucose concentration in the pathogenesis and diagnosis of T2DM and its complications, GWA for glucose concentrations provides an excellent opportunity to identify genes underlying variation in glucose concentrations that may also repre-sent additional T2DM susceptibility loci. An example of this comes from the studies by Weedon et al., who showed by metaanalysis and large cohorts that variation in the glucokinase gene was associated with both fasting glucose and birth weight (25).

GWA studies for T2DM and adiposity were completed by the groups undertaking the Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics (FUSION) (14, 26, 27) and the SardiNIA Study of Aging (24, 28), respec-tively. Both studies assessed fasting glucose in their respective

cohorts, allowing GWAs for fasting glucose in each study and combination of these results in a metaanalysis. The strongest signals from the fasting glucose GWA metaanalysis were from variants near genes for ATP-binding cassette, subfamily B (MDR/TAP), member 11 (ABCB11) and glucose-6-phosphatase catalytic subunit 2 (G6PC2). This association was replicated in a series of 7 studies involving a total of 18,436 individuals (13, 29–35), sug-gesting for what we believe is the first time that variation in one of these genes may play a role in the regulation of fasting glucose concentrations in humans.

Subject demographics and clinical characteristics for the FUSION and SardiNIA samples are summarized in Table 1. Because treat-ment for T2DM affects fasting glucose concentrations, all analyses in this report were restricted to nondiabetic subjects. Initial review of association results from both the FUSION stage 1 and SardiNIA GWA scans of a combined total of 5,088 nondiabetic individuals focused on SNPs that were genotyped in the SardiNIA study and imputed in the FUSION study. Among these, rs563694 exhibited the strongest evidence for association in both samples (SardiNIA, P = 7.6 × 10–5; FUSION stage 1, P = 8.0 × 10–4; Table 2), with a metaanalysis P value of 3.5 × 10–7. Given the strength of this initial association, our follow-up efforts focused on rs563694. Additional independent associations from our fasting glucose GWA study are presented in Supplemental Table 1 (supplemental material avail-able online with this article; doi:10.1172/JCI34566DS1).

Analyses were repeated once imputation was completed in both the FUSION stage 1 and SardiNIA samples. SNP rs563694 and other SNPs in strong linkage disequilibrium (LD; defined as r2 > 0.8 in the FUSION samples) constituted the 17 strongest association results in the combined FUSION/SardiNIA GWA for fasting glu-cose metaanalysis (Figure 1). In fact, 22 SNPs associated with fast-ing plasma glucose with P ≤ 1 × 10–4 were located within a 63.9-kb region on chromosome 2 (Supplemental Table 2). These SNPs were located in an extended region of LD that spans 2 biologically plausible candidate genes for glucoregulation (Figure 1). The first is G6PC2, also known as islet-specific glucose-6-phosphatase–related protein (IGRP). G6PC2 is part of a larger family of enzymes involved in hydrolysis of glucose-6-phosphate in the gluconeogen-ic and glycogenolytic pathways (36, 37). The second is ABCB11, a member of the MDR/TAP subfamily of ATP-binding cassette transporters involved in multidrug resistance (38, 39).

Subject demographics and clinical characteristics for individuals with rs563694 genotype data

Study Phenotyped Geographic Study age BMI Fasting subjects origin (years) (kg/m2) glucose (mM)FUSION stage 1 1,233 Finland 63.0 (13.7) 26.6 (5.0) 5.36 (0.72)FUSION stage 2 655 Finland 61.0 (12.3) 26.3 (4.9) 5.48 (0.50)FUSION additional spouses/offspring 522 Finland 39.1 (12.2) 26.0 (6.4) 5.11 (0.78)SardiNIA 3,855 Sardinia, Italy 41.3 (27.1) 24.7 (6.3) 4.72 (0.77)DGI 1,411 Finland and Sweden 58.7 (15.4) 26.7 (4.78) 5.28 (0.70)Amish 1,655 USA 49.0 (23.7) 26.7 (6.6) 4.90 (0.58)METSIM 4,386 Finland 59.0 (10.0) 26.4 (4.5) 5.60 (0.70)Caerphilly 1,063 United Kingdom 56.7 (4.4) 26.2 (3.5) 4.80 (0.86)BWHHS 3,532 United Kingdom 68.5 (5.9) 27.6 (5.0) 5.80 (0.87)Inter99 5,734 Denmark 46.1 (7.9) 26.3 (4.5) 5.54 (0.80)

84

Among all genotyped or imputed SNPs in this region, rs560887, which was genotyped in FUSION stage 1, and imputed and fol-lowed up by genotyping in SardiNIA, showed the strongest overall evidence for association (SardiNIA, P = 4.4 × 10–8; FUSION stage 1, P = 1.7 × 10–3; Supplemental Table 2), with a metaanalysis P value of 2.8 × 10–10. In addition, rs853789 and rs853787, both located in intron 19 of ABCB11 and in perfect LD with each other (Dʹ = 1.0, r2 = 1.0), showed strong evidence for association with fasting glucose concentrations with metaanalysis P values of 1.4 × 10–9 and 1.0 × 10–9, respectively (Supplemental Table 2). rs853789 is located 38.3 kb from rs560887 and 27.4 kb from rs563694 and is in strong LD with both SNPs (Dʹ = 0.98, r2 = 0.81 with rs560887; and Dʹ = 0.98, r2 = 0.95 with rs563694). rs560887 is 10.9 kb from rs563694, is in high LD with rs563694 (Dʹ = 0.99, r2 = 0.84), and is located in intron 3 of G6PC2. In contrast, rs563694 lies between G6PC2 and ABCB11 and is in extended LD with ABCB11. In both the SardiNIA and FUSION stage 1 samples, each copy of the A allele for rs563694 was associated with small increases in fast-ing glucose (0.064 mM for SardiNIA and 0.051 mM for FUSION stage 1; Table 2) that are clinically insignificant and accounted for approximately 1% of the variance in fasting glucose. Similar effect sizes were observed for rs560887 (0.089 mM for SardiNIA and 0.052 mM for FUSION stage 1).

We assessed the potential contribution of population stratifica-tion by computing the genomic control parameter (40) indepen-dently for both studies. The genomic control values were 1.01 for both FUSION and SardiNIA, suggesting that population stratifi-cation and/or unmodeled relatedness did not contribute signifi-cantly to our observed association. Analyses that included BMI as a covariate did not significantly alter the association between rs563694 and fasting glucose in the FUSION stage 1 (P = 8.1 × 10–4 without BMI versus 9.1 × 10–4 with BMI) and SardiNIA samples (7.6 × 10–5 without BMI versus 3.8 × 10–5 with BMI) independently or jointly (P = 3.5 × 10–7 without BMI versus 1.8 × 10–7 with BMI), suggesting the association was not a consequence of adiposity, which is known to induce insulin resistance and increase glucose concentrations (2). The association between rs563694 and fasting

glucose also remained significant after individual adjustment for each of the 10 SNPs shown to be associated with T2DM in our recent GWA studies (Supplemental Table 3) (13–15) or when all 10 SNPs were included jointly in the model (P = 6.8 × 10–4 versus P = 8.0 × 10–4 for FUSION stage 1 samples and 5.1 × 10–5 versus 7.6 × 10–5 for SardiNIA samples). The 10 SNPs shown to be associ-ated with T2DM were themselves not significantly associated with fasting glucose concentrations in the FUSION stage 1 or SardiNIA samples (Supplemental Table 4).

FUSION investigators genotyped rs563694 in 655 stage 2 sam-ples and 522 additional spouses and offspring of T2DM patients included in stage 1; this SNP continued to show evidence for association with fasting glucose (FUSION stage 2, P = 2.0 × 10–3; FUSION stage 1 families, P = 1.9 × 10–5; see Table 2). The meta-analysis that combined results from the FUSION stage 1 and 2 and SardiNIA studies resulted in a P value of 5.3 × 10–9, surpassing standard thresholds for genome-wide significance.

We also examined the association between rs563694 and fast-ing glucose in 6 follow-up samples (Table 2). The characteristics of these samples are summarized in Table 1. Association between rs563694 and fasting glucose was confirmed in the Amish study (P = 4.1 × 10–5); the METabolic Syndrome In Men study (METSIM; P = 1.3 × 10–10), the Caerphilly study (2.6 × 10–7), the British Wom-en’s Heart and Health Study (BWHHS; P = 1.2 × 10–3), and Inter99 (P = 8.2 × 10–8; Table 2), with fasting glucose concentrations increas-ing with each copy of the A allele in all studies. While evidence for association in the Diabetes Genetics Initiative (DGI) study was not statistically significant (P = 0.19; Table 2), the results show a trend in the same direction as observed in the other samples. When the results from all follow-up studies were combined in a meta-analysis of 24,046 samples, there was strong evidence for associa-tion between rs563694 and fasting glucose in both the follow-up samples (n = 18,435, P = 6.9 × 10–26; Table 2) and in all GWA and follow-up samples combined (n = 24,046, P = 6.4 × 10–33; Table 2). In contrast, rs563694 did not show evidence for association with T2DM in the FUSION stage 1 study (P = 0.22), the DGI GWA sam-ples (P = 0.78), or the METSIM study (P = 0.09).

Association between rs563694 and fasting glucose in nondiabetic individuals

Frequency Mean fasting glucose (mM) (SD) Effect (SE) Effect (SE)Study n C allele CC AC AA mM Standardized P valueGWA Samples FUSION stage 1 1,233 0.34 5.26 (0.48) 5.31 (0.48) 5.33 (0.47) 0.051 (0.019) 0.143 (0.043) 8.0 × 10–4

SardiNIA 3,855 0.46 4.88 (0.67) 4.95 (0.62) 5.00 (0.59) 0.064 (0.018) 0.118 (0.030) 7.6 × 10–5

3.5 × 10–7A

FUSION 1 familiesB 1,755 0.34 5.20 (0.49) 5.28 (0.50) 5.31 (0.54) 0.065 (0.018) 0.155 (0.036) 1.9 × 10–5

Follow-up samples FUSION stage 2 655 0.36 5.28 (0.43) 5.44 (0.35) 5.46 (0.36) 0.068 (0.021) 0.180 (0.058) 2.0 × 10–3

DGI 1,411 0.34 5.24 (0.50) 5.28 (0.51) 5.29 (0.49) 0.022 (0.021) 0.053 (0.039) 0.19 Amish 1,655 0.24 4.90 (0.47) 4.89 (0.51) 5.03 (0.53) 0.090 (0.022) 0.175 (0.042) 4.1 × 10–5

METSIM 4,386 0.32 5.55 (0.49) 5.64 (0.50) 5.71 (0.49) 0.074 (0.011) 0.145 (0.023) 1.3 × 10–10

Caerphilly 1,063 0.36 4.69 (0.91) 4.87 (0.99) 5.00 (1.19) 0.155 (0.047) 0.214 (0.041) 2.6 × 10–7

BWHHS 3,532 0.34 6.01 (1.69) 6.09 (1.81) 6.06 (1.49) 0.006 (0.042) 0.079 (0.025) 1.2 × 10–3

Inter99 5,734 0.36 5.46 (0.85) 5.52 (0.87) 5.58 (0.70) 0.057 (0.015) 0.135 (0.019) 8.2 × 10–8

6.3 × 10–28C

6.1 × 10–35D

85

Figure 2 shows the results of a metaanalysis based upon the effect size observed in each of the 8 studies. Overall, fasting glucose concentrations increased 0.065 mM (95% CI: 0.053–0.077 mM) with each copy of the major allele.

We took advantage of GWA studies originally performed to identify susceptibility genes for T2DM (FUSION) and aging-related traits (SardiNIA) to also identify genes underlying variation in fasting glucose concentration. Both FUSION and SardiNIA initially identi-fied rs563694 as being associated with fasting glucose levels. Given that both studies were performed in relatively homogeneous popu-lations of mixed European descent, it is unlikely that population stratification accounted for the initial association. The estimated genomic control (40) values for FUSION stage 1 and SardiNIA were both 1.01, providing further evidence against the contribution of population stratification to the observed association.

In the SardiNIA sample, we genotyped SNPs rs560887 and rs853789 to validate the results based on imputation. The dis-crepancy rate per allele between the imputed and typed genotypes at these 2 SNPs was 1.4% and 2.4%, respectively, and the associa-tion result with the actual genotypes was stronger than with the imputed genotypes: P = 9.0 × 10–10 and 2.6 × 10–8, respectively

Adiposity may induce insulin resistance and thus alter glucose concentrations (2) independent of the effects of the SNP on glu-cose concentrations per se. However, the association remained sig-nificant even when we included BMI as a covariate in the analysis, suggesting adiposity is not a major contributor to the observed association. Similarly, in the follow-up studies, the results did not change whether BMI was included or excluded as a covariate. Some known sex-specific effects, such as differences in fat distri-bution, could also confound our results. We found no sex-specific effect modification in the FUSION and SardiNIA samples. Also, it should be noted that we observed evidence for association between rs563694 in the METSIM and Caerphilly samples that only includ-ed men and in the BWHHS that comprised women only. Thus, the lack of a sex-specific effect in FUSION and SardiNIA is sup-

ported by the independent associations observed in these samples. Subsequent analyses of the GWA data revealed rs560887 as having the strongest evidence for association with fasting glucose in this region and suggested, based on the SNP location, that G6PC2 plays a role in glucoregulation. However, 2 additional SNPs in strong LD with rs560887 located in the adjacent ABCB11 also showed similar evidence for association with fasting glucose.

In the 7 follow-up studies, rs563694 continued to show associa-tion with fasting glucose, although marginal evidence for hetero-geneity among studies was noted (Q = 14.6; P = 0.02; I2 = 59.0%; 95% CI: 5.6–82.2%) (37). For example, the DGI samples did not exhibit a significant association and the BWHHS samples, despite being among the largest follow-up samples, showed only modest evidence for association (Table 2). These 2 studies yielded similar effect size estimates (0.053 for DGI and 0.079 for BWHHS) that were smaller than in the other studies (Table 2). Differences in both populations and sample ascertainment could be contribut-ing to the observed heterogeneity. When these 2 studies are not considered, the heterogeneity estimate is reduced (Q = 3.7; P = 0.45; I2 = 0%; 95% CI = 0–77.6%) However, despite the variability in effect size, the direction of the effect was the same in all studies.

There are 2 biologically plausible candidate genes in the region identified by our association analyses that may affect glucose levels. Although rs560887, which is located in intron 3 of G6PC2 just 26 bp proximal to exon 4, showed the strongest evidence for association in the GWA studies, SNPs in LD with rs560887 and rs563694 that show similar levels of association with fasting glu-cose concentrations were located in intron 19 of ABCB11. ABCB11 is involved in ATP-dependent secretion of bile salts and is almost exclusively expressed in the liver. Mutations in ABCB11 have been shown to be associated with intrahepatic cholestasis (OMIM 603201) (38) and drug-induced hepatotoxicity (39, 41). In anti-lipid drug trials, bile acid sequestrants have been shown to lower glucose concentrations and improve insulin sensitivity, presum-ably through reduction of triglyceride levels (42). Based upon these observations, if ABCB11 were contributing significantly to varia-tion in fasting glucose, one might expect to also see associations

86

with lipids or insulin sensitivity. However, rs560887, rs563694, rs853789, and rs853787 were not associated with lipid measure-ments in a metaanalysis of FUSION stage 1 and SardiNIA samples (P > 0.16). Also, none of these SNPs were associated with minimal model-derived insulin sensitivity in FUSION samples (P > 0.30). Thus, our data do not support a role for ABCB11 in glucoregula-tion, and other evidence directly linking ABCB11 to regulation of glucose concentrations is scarce.

In contrast, G6PC2, the β cell–specific isoform of glucose-6-phos-phatase is a highly relevant candidate gene for glucoregulation. The mouse homolog G6pc2 has been previously implicated as an auto-antigen in the NOD mouse model of type 1 diabetes (43). Wang et al. recently generated G6pc2-null mice and noted that at 16 weeks of age, fasting glucose concentrations had decreased approximately 13% in both male and female G6pc2-null mice when compared with wild-type mice (44). This modest decrease in glucose concentration was observed despite the absence of any differences in body weight, fasting insulin, or fasting glucagon concentrations. The character-istics of these G6pc2-null mice closely paralleled our observations that rs560887 and rs563694 were associated with modest chang-es in fasting glucose but not in BMI or fasting insulin, which are consistent with the hypothesis that presence of a C allele results in lower G6PC2 expression and therefore lower glucose concentrations. Interestingly, G6pc2 mRNA levels appear to increase with increasing glucose concentration in isolated mouse islets (36).

Molecular cloning of G6pc2 identified 2 splice forms that differ by the presence or absence of exon 4 in BALB/C and ob/ob mice and in insulinoma tissue (45). The longer cDNA including exon 4 has approximately 50% homology with glucose-6-phosphatase cat-alytic subunit (G6pc) across a variety of species including humans and is membrane bound in the endoplasmic reticulum (46). The corresponding G6PC2 splice forms have been observed in human

pancreas (47). rs560887 is located in intron 3, just 26 bp proximal to exon 4, raising the possibility that this variant may play a role in whether the full-length transcript is formed.

G6PC hydrolyzes glucose-6-phosphate to form glucose and release a phosphate group. Despite its similarity to G6PC, G6PC2 is reported to have little to no hydrolase activity in humans (36, 37, 45, 46). In normal and genetically obese mice, the splice form lack-ing exon 4 appears to be the most predominant observed in islets (45) and lacks sequences that may be critical for hydrolytic activity (45, 48), suggesting the full-length form of G6pc2 may have impli-cations for activity of G6pc2 and its potential role in glucoregula-tion. Greater hydrolase activity has been reported in cell lines over-expressing the full-length form of G6PC2 (36). Also, in islets from streptozotocin-treated mice, glucose cycling, an indicator of G6pc2 activity, was approximately 3-fold higher compared with islets from untreated mice (49), and even greater increases were observed in islets from ob/ob mice (50, 51). The conversion of glucose to glu-cose-6-phosphate is the critical step in stimulus-secretion coupling for insulin secretion. Variation in G6PC2 may increase glucose cycling in β cells, resulting in altered generation of ATP, which would have implications for insulin secretion. In addition, G6PC2-induced alterations in β cell glucose metabolism would also have downstream effects on phosphoinositide 3-kinase activity, which regulates pancreas duodenum homeobox-1 (PDX1) binding to the insulin gene and subsequent insulin gene transcription (52).

The possible role for G6PC2 in altering glucose concentrations raises the question of whether this gene also confers susceptibil-ity to T2DM. We observed no association between fasting glucose and rs563694 and rs560887 in individuals with T2DM from the FUSION, DGI, and METSIM studies (P > 0.50). However, the anal-ysis of fasting glucose concentration in individuals with T2DM is confounded by diabetes pathology, treatment, and differential response to therapy. Therefore the lack of association with fast-ing glucose in individuals with T2DM does not preclude G6PC2 as contributing to susceptibility to T2DM. Similarly, when we tested these SNPs for association with T2DM in the FUSION, DGI, and METSIM samples, we observed no evidence for associa-tion (P > 0.08). Further, the modest effect on glucose concentra-tions observed in our analysis of nondiabetic individuals suggests we may lack sufficient power to detect association with T2DM. Whereas the cumulative evidence would suggest that G6PC2 may regulate fasting glucose concentrations and does not contribute significantly to susceptibility to T2DM, larger studies may be required to elucidate the role of this gene in T2DM susceptibility.

Variation in the promoter region of glucokinase (GCK, rs1799884) has been shown to be associated with fasting glucose and impaired insulin secretion (53–55) and may play a role in altering birth weight (56). These initial findings were confirmed in a comprehensive meta-analysis performed by Weedon et al., demonstrating that rs1799884 was associated with fasting glucose (meta P = 1.0 × 10–9) and that the presence of a maternal A allele for rs1799884 was associated with increased birth weight of the child (P = 0.02) (25). GCK, an enzyme that works counter to G6PC2, converts glucose to glucose-6-phos-phate, forming the critical step in secretion-stimulus coupling in pancreatic β cells. In addition, the recent GWA study from the DGI identified variation in glucokinase regulatory protein (GCKR) (rs780094) to be associated with triglyceride levels (13). GCKR is an allosteric regulator of GCK in both liver and pancreatic islets whose inhibitory effect is enhanced by fructose-6-phosphate and sup-pressed by fructose-1-phosphate (57). We found modest evidence

87

for association between fasting glucose and rs1799884 (FUSION stage 1, P = 1.6 × 10–2; SardiNIA, P = 2.0 × 10–3; meta P = 1.1 × 10–4) and no evidence for association between fasting glucose and rs780094 (FUSION stage 1, P = 0.44; SardiNIA, P = 0.11; meta P = 0.077). While these results provide evidence for association between varia-tion in GCK and fasting glucose but not between GCKR and fast-ing glucose in our studies, we cannot exclude the possibility that a complex interaction among GCK, GCKR, and G6PC2 may regulate fasting glucose levels. This will require further study.

In conclusion, we used GWA to identify variation in both ABCB11 and G6PC2 as genes that potentially contribute to variation in fasting glucose concentrations in nondiabetic subjects of mixed European descent. There is more literature with data supporting a role for G6PC2, but in the absence of functional data, we cannot discount the possibility that ABCB11 may also contribute signifi-cantly to variation in fasting glucose concentration. Heritability for fasting glucose has been estimated to be 25%–40% (19–24), yet the variants we identified account for approximately 1% of the variance in fasting glucose, indicating that the majority of the variability in fasting glucose remains unexplained. The remaining variability is likely due to the effects of additional common genetic variants of modest effect, less common genetic variants of mod-erate effect, and a variety of gene-gene and gene-environmental interaction effects. It should also be noted that the magnitude of the effect observed in our study is consistent with other reports of quantitative trait associations (58–60).

Additional studies, likely with larger sample sizes, will be required to identify additional genetic variants contributing to variation in fasting glucose. The variants identified in our study are not likely to be functional, but in LD with the functional variant(s). Additional fine mapping, sequencing, and functional studies will be required to define the molecular mechanisms underlying our observed association.

The FUSION and SardiNIA study samples and GWA genotyping have been described in detail (14, 24, 26–28). Here, we briefly review the study cohorts and genotyping methods. We also describe briefly each of the 7 follow-up samples. Subject demographics and basic clinical characteristics for indi-viduals genotyped for rs563694 for each sample are described below and summarized in Table 1. All protocols were approved by the institutional review boards or research ethics committees at the respective institutions, and informed consent was obtained from all subjects.

FUSION GWA study. The goal of the FUSION study is to identify genet-ic variants that predispose to T2DM or that determine the variability in T2DM-related quantitative traits. The study began as an affected sibling-pair family study (26, 27), later augmented by large numbers of cases and controls for association analysis (14). The FUSION GWA study was per-formed using a 2-stage case-control design (14). Cases and controls were approximately frequency matched on 5-year age category, sex, and birth province. All stage 1 DNA samples were genotyped using the Illumina HumanHap300 BeadChip version 1.0, resulting in data on 315,635 SNPs that passed quality control filters (14). Genotype data for an additional 2.09 million SNPs were estimated using an imputation procedure (61). The genotype imputation method uses stretches of chromosome shared between individuals genotyped at relatively low density in our studies and individuals genotyped in greater density by the International Hap-Map Consortium (61) to estimate the missing genotypes. Comparison of imputed and measured genotypes yielded estimated error rates of 1.46% (Illumina) to 2.14% (Affymetrix) per allele with an average concordance of

98.5%, consistent with expectations from HapMap data (61). SNPs show-ing promising association with fasting plasma glucose in the stage 1 sam-ples were genotyped in the stage 2 DNA samples by homogeneous MassEX-TEND reaction using the MassARRAY System (Sequenom) (14). Because treatment for T2DM affects fasting glucose concentrations, all analyses in this report were restricted to nondiabetic subjects. Diabetes status was confirmed by WHO criteria (62) or confirmation of treatment for diabe-tes by medical record review. Fasting plasma glucose concentrations were available for 1,233 stage 1 and 655 stage 2 samples. Additional FUSION samples included nondiabetic spouses or offspring from FUSION stage 1 families; fasting plasma glucose data were available for 578 individuals. These 578 samples were genotyped using the Applied Biosystems Taq-Man allelic discrimination assays (63) and yielded 522 samples with both genotype and fasting glucose data. These samples were integrated into the FUSION stage 1 samples and independently analyzed to assess whether the additional family members improved the evidence for association. We have denoted this analysis FUSION 1 families.

SardiNIA GWA study. The SardiNIA study is a longitudinal study of aging-related quantitative traits and comprises a cohort of 6,148 individuals 14 years or older recruited from 4 towns in the Lanusei Valley in Sardinia. Data from 4,350 individuals with fasting serum glucose measurements from this cohort were used for the GWA study; 3,331 were genotyped using the Affymetrix 10K SNP Mapping Array, and an additional 1,412 were genotyped using the Affymetrix 500K SNP Mapping Array (28). 356,359 SNPs passed quality control and were tested for association with fasting serum glucose. We first used the genotyped SNPs in the 1,412 individuals to estimate genotypes for all the polymorphic SNPs genotyped by the Hap-Map Consortium. Taking advantage of the relatedness among individuals in the SardiNIA sample, we then conducted a second round of computa-tional analysis to impute genotypes for analysis in the 2,938 individuals not genotyped with the 500K SNP Array. In this second round, we identi-fied large stretches of chromosome shared within each family and proba-bilistically “filled-in” genotypes within each stretch whenever 1 or more of its carriers was genotyped with the 500K Array Set (64, 65). For these analyses, 37 non-Sardinians and 281 of their family members (n = 318) and 177 individuals with known diabetes were excluded from the analysis, resulting in a final sample size of 3,855.

Follow-up samples. The initial association identified in the metaanalysis of the FUSION and SardiNIA GWA studies was also tested in a series of follow-up samples (Table 1), 1 from FUSION described above and 6 others, which are described briefly below.

DGI. The DGI case-control GWA sample consists of 1,464 cases with T2DM and 1,467 normoglycemic controls from Finland and Sweden and has been previously described in detail (13). Fasting glucose measurements were available for 1,455 nondiabetic control subjects (1,305 unrelated sub-jects and 150 siblings). Among these, fasting plasma glucose was measured in 537 subjects and fasting whole blood glucose was measured in 918 subjects. Whole-blood glucose concentrations were converted to equivalent plasma values using a conversion factor of 1.13 (66). All samples were genotyped using the Affymetrix GeneChip Human Mapping 500K Array set; results of GWA of 389,878 SNPs with fasting glucose levels (including SNP rs563694) are publicly available at www.broad.mit.edu/diabetes/scandinavs/index.html. 1,411 individuals were available with both rs563694 genotype and fasting glucose data.

Old Order Amish subjects. The Old Order Amish study participants report-ed here were 1,655 nondiabetic subjects from Lancaster, Pennsylvania, USA, for whom fasting plasma glucose measurements were available. These subjects were enrolled in ongoing family studies of complex diseases and traits (29–31). Genotyping for rs563694 was performed using the TaqMan allelic discrimination assay (63).

88

METSIM study. Subjects were selected from the ongoing METSIM study, which includes 7,000 men, aged 50 to 70 years, randomly selected from the population of the town of Kuopio, Eastern Finland, Finland (population 95,000). The present analysis is based on the first 4,386 non-diabetic subjects examined for METSIM with available fasting plasma glucose values. Genotyping was performed using the TaqMan allelic dis-crimination assay (63).

Caerphilly study. The Caerphilly study is a cohort study of white, Euro-pean men (n = 1,069; 97.4% born in the United Kingdom), aged 45–59 years at entry in 1979–1983 (32), recruited from the town of Caerphilly, United Kingdom, and 5 adjacent villages. Men were selected using the electoral role and general practitioner records. DNA and fasting plasma glucose measure-ments used in this study relate to the first phase of data collection.

BWHHS. The BWHHS consists of female participants, aged 60 to 79 years and recruited between April 1999 and March 2001. Initially, 4,286 women were randomly selected from 23 British towns and were interviewed and clinically examined. They also completed medical questionnaires (33).

Genotyping for the Caerphilly study and BWHHS was performed by KBioscience using their fluorescence-based competitive allele-specific PCR (KASPar) technology.

The Inter99 Study. rs563694 was genotyped in 5,734 Danes for whom fast-ing plasma glucose values were available. This sample comprises part of the population-based Inter99 sample of middle-aged people sampled at Research Centre for Prevention and Health (Glostrup, Denmark; refs. 34, 35). Geno-typing was performed using TaqMan allelic discrimination (KBioscience).

Statistics. Association between fasting glucose and genotypes in the FUSION and SardiNIA studies was carried out using a regression framework in which regression coefficients were estimated in the context of a variance compo-nent model to account for relatedness among individuals (65). For FUSION samples, plasma glucose concentration was adjusted for sex, age, age2, birth province, and study group. Analyses were carried out in nondiabetic individ-uals excluding those known to be taking medications that directly affect glu-cose concentration. Similarly, SardiNIA serum glucose values were adjusted for sex, age, and age2. Because diabetes-based exclusions were based only on medical records and SardiNIA only measured fasting serum glucose, a small number of undiagnosed new-onset diabetes cases may have been included in the analysis. For both studies, analyses were repeated including BMI as an additional covariate to assess whether adiposity significantly contributed to the evidence for association. Covariate-adjusted trait values were trans-formed to approximate univariate normality by applying an inverse normal scores transformation; the scores were ranked, ranks were transformed into quantiles, and quantiles were converted to normal deviates.

A weighted z score–based fixed effects metaanalysis method was used to combine results from the FUSION and SardiNIA studies. In brief, for each SNP, a reference allele was identified and a z statistic summarizing the mag-nitude of the P value for association and direction of effect was generated for each study. An overall z statistic was then computed as a weighted average of the individual statistics, and a corresponding P value for that statistic was computed. The weights were proportional to the square root of the num-ber of individuals in each study and scaled such that the squared weights summed to 1. For the metaanalysis of the effect size, the inverse variance was used as weights for each study. For the FUSION 1 families (FUSION stage 1 plus additional FUSION spouses and offspring) a regression-based analysis under a variance components framework was used to appropriately account for relationships among individuals (65). Because we did not have birth prov-ince information for the additional spouses and offspring, these analyses were carried out adjusting for age, age2, sex, and study group only.

Given the different sampling schemes, statistical analyses for the follow-up samples varied by study. The Old Order Amish samples consisted of large Amish pedigrees, so the evidence for association between genotype

and fasting plasma glucose was evaluated using variance components analysis implemented in SOLAR to adjust for the relatedness of study sub-jects (67, 68). Plasma glucose levels were natural logarithm transformed for analysis, and covariates included sex, age, and age2. For the DGI study, glu-cose values were converted to z scores separately by sex, and tests for associ-ation were carried out using a regression framework with age and log(BMI) included as covariates; genomic control was applied to account for related-ness (13). For the METSIM study, analyses were carried out identically as in FUSION, with the exception that birth province was not included as a covariate. For the Caerphilly and BWHHS studies, association was assessed using a regression framework with age, age2, and BMI as covariates. For the Inter99 study, association was assessed using a regression framework with age and sex as covariates. Individuals with known diabetes at the time of examination were excluded from the analyses. Results from all follow-up studies were combined in a metaanalysis as described above. Finally, a metaanalysis that combined results from all GWA and follow-up studies was performed as described above.

We would like to thank the many research volunteers who generous-ly participated in the various studies represented in this study. For the FUSION study, we also thank Peter S. Chines, Narisu Narisu, Andrew G. Sprau, and Li Qin for informatics and genotyping sup-port and the Center for Inherited Disease Research for the FUSION GWA genotyping. For the SardiNIA study, we thank the mayors of Lanusei, Ilbono, Arzana, and Elini, the head of local Public Health Unit ASL4, and the residents of the towns for their volunteerism and cooperation. In addition, we are grateful to the mayor and the administration in Lanusei for providing and furnishing the clinic site. We thank the team of physicians — Maria Grazia Pilia, Danilo Fois, Liana Ferreli, Marcello Argiolas, Francesco Loi, and Pietro Figus — and the nurses Paola Loi, Monica Lai, and Anna Cau, who carried out the physical examinations and made the observations.

We thank the former Medical Research Council (MRC) Epide-miology Unit (South Wales) who undertook the Caerphilly study. The Department of Social Medicine, University of Bristol, now acts as custodian for the Caerphilly database. We are grateful to all of the men who participated in this study. For the BWHHS, we thank all of the general practitioners and their staff who supported data collection and the women who participated in the study.

For the Amish studies, we thank members of the Amish com-munity for the generous donation of time to participate in these studies and our field nurses, Amish liaisons, and clinic staff for their extraordinary efforts. We also acknowledge Sandy Ott and John Shelton for genotyping of Amish DNA samples.

Support for this study was provided by the following: Ameri-can Diabetes Association (ADA) (1-05-RA-140 to R.M. Watanabe; 7-04-RA-111 to A.R. Shuldiner; and postdoctoral fellowships to C.J. Willer and H.M. Stringham); and NIH grants (DK069922 and U54 DA021519 to R.M. Watanabe; DK062370 to M. Boehn-ke; DK072193 to K.L. Mohlke; DK062418 to W-M. Chen; R01 DK54361, U01 HL72515, and R01 AG18728 to A.R. Shuldiner; R01 HL69313 to B.D. Mitchell; and R01 DK068495 to K.D. Sil-ver). D.A. Lawlor is funded by a UK Department of Health career scientist award, and N. Timpson is funded by a studentship from the MRC of the United Kingdom.

The Inter99 Study was supported by the European Union (EUGENE2, LSHM-CT-2004-512013); the Lundbeck Founda-tion Centre of Applied Medical Genomics in Personalized Dis-ease Prediction, Prevention and Care; the FOOD Study Group/

89

the Danish Ministry of Food, Agriculture and Fisheries and Ministry of Family and Consumer Affairs (2101-05-0044); and the Danish Medical Research Council.

This research was supported in part by the intramural Research Program of the NIH, National Institute on Aging, and the NIDDK. Additional support came from contract N01-AG-1-2109 from the NIA intramural research program for the SardiNIA (ProgeNIA) team; National Human Genome Research Institute intramural project number 1 Z01 HG000024 (to F.S. Collins); University of Maryland General Clinical Research Center (M01 RR 16500); Johns Hopkins University General Clinical Research Center (M01 RR 000052); the NIDDK Clinical Nutrition Research Unit of Maryland (P30 DK072488); and the Department of Veterans Affairs and Veter-ans Affairs Medical Center Baltimore Geriatric Research, Education and Clinical Center (GRECC). The BWHHS receives core funding from the United Kingdom Department of Health policy research program. The DNA extraction and genotyping for BWHHS were funded by the British Heart Foundation. The Caerphilly study was funded by the MRC of the United Kingdom. Funding for the Caer-philly DNA Bank was from an MRC grant (G9824960). The United

Kingdom MRC supports work undertaken in the Centre for Causal Analyses in Translational Epidemiology.

The views expressed in this paper are those of the authors and not necessarily those of any funding body or others whose support is acknowledged. Those providing funding had no role in study design, data collection and analysis, decision to publish, or prepara-tion of the manuscript.

Received for publication November 26, 2007, and accepted in revised form April 23, 2008.

Address correspondence to: Angelo Scuteri, Unità Operativa Geria-tria, Istituto Nazionale Ricovero E Cura Anziari, Rome, Italy. Phone: 39-3334564136; Fax: 39-06-30362896; E-mail: angeloelefante@ interfree.it. Or to: Richard M. Watanabe, Keck School of Medicine of USC, Department of Preventive Medicine, 1540 Alcazar St., CHP-220, Los Angeles, California 90089-9011, USA. Phone: (323) 442-2053; Fax: (323) 442-2349; E-mail: [email protected].

Wei-Min Chen and Michael R. Erdos are co–first authors.

1. Reaven, G.M. 1988. Role of insulin resistance in human disease. Diabetes. 37:1595–1607.

2. DeFronzo, R.A. 1987. The triumvirate: B-cell, muscle, liver. A collusion responsible for NIDDM. Diabetes. 37:667–687.

3. National Diabetes Data Group. 1979. Classification and diagnosis of diabetes mellitus and other catego-ries of glucose intolerance. Diabetes. 28:1039–1057.

4. [No authors listed]. 1985. Diabetes Mellitus: Report of a WHO Study Group. World Health Organ. Tech. Rep. Ser. 727:1–113.

5. The Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. 1997. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care. 20:1183–1197.

6. DeFronzo, R.A., and Ferrannini, E. 1991. Insulin resistance: A multifaceted syndrome responsible for NIDDM, obesity, hypertension, dyslipidemia, and atherosclerotic cardiovascular disease. Diabetes Care. 14:173–194.

7. Xiang, A.H., et al. 2006. Coordinate changes in plas-ma glucose and pancreatic β-cell function in Latino women at high risk for type 2 diabetes. Diabetes. 55:1074–1079.

8. Mason, C.C., Hanson, R.L., and Knowler, W.C. 2007. Progression to type 2 diabetes characterized by moderate then rapid glucose increases. Diabetes. 56:2054–2061.

9. Rich, S.S. 1990. Mapping genes in diabetes. Diabetes. 39:1315–1319.

10. Ghosh, S., and Schork, N.J. 1996. Genetic analysis of NIDDM: the study of quantitative traits. Diabetes. 45:1–14.

11. Diabetes Prevention Program Research Group. 2002. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 346:393–403.

12. Tuomilehto, J., et al. 2001. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N. Engl. J. Med. 344:1343–1350.

13. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, et al. 2007. Genome-wide asso-ciation analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 316:1331–1336.

14. Scott, L.J., et al. 2007. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 316:1341–1345.

15. Zeggini, E., et al. 2007. Replication of genome-wide association signals in U.K. samples reveals risk loci

for type 2 diabetes. Science. 316:1336–1341. 16. Sladek, R., et al. 2007. A genome-wide association

study identified novel risk loci for type 2 diabetes. Nature. 445:881–885.

17. Steinthorsdottir, V., et al. 2007. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat. Genet. 39:770–775.

18. Zeggini, E., et al. 2008. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40:638–645.

19. Beaty, T.H., and Fajans, S.S. 1982. Estimating genetic and non-genetic components of variance for fasting glucose levels in pedigrees ascertained through non-insulin dependent diabetes. Ann. Hum. Genet. 46:355–362.

20. Boehnke, M., Moll, P.P., Kottke, B.A., and Weid-man, W.H. 1987. Partitioning the variability of fasting plasma glucose levels in pedigrees. Am. J. Epidemiol. 125:679–689.

21. Sakul, H., et al. 1997. Familiality of physical and metabolic characteristics that predict the develop-ment of non-insulin-dependent diabetes mellitus in Pima Indians. Am. J. Hum. Genet. 60:651–656.

22. Watanabe, R.M., et al. 1999. Familiality of quantita-tive metabolic traits in Finnish families with non-insulin-dependent diabetes mellitus. Hum. Hered. 49:159–168.

23. Henkin, L., et al. 2003. Genetic epidemiology of insulin resistance and visceral adiposity. The IRAS family study design and methods. Ann. Epidemiol. 13:211–217.

24. Pilia, G., et al. 2006. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2:e132.

25. Weedon, M.N., et al. 2006. A common haplotype of the glucokinase gene alters fasting glucose and birth weight: Association in six studies and population-genetics analyses. Am. J. Hum. Genet. 79:991–1001.

26. Valle, T., et al. 1998. Mapping genes for NIDDM. Design of the Finland-United States Investigation of NIDDM Genetics (FUSION) Study. Diabetes Care. 21:949–958.

27. Silander, K., et al. 2004. A large set of Finnish affected sibling pair families with type 2 diabetes suggests susceptibility loci on chromosomes 6, 11, and 14. Diabetes. 53:821–829.

28. Scuteri, A., et al. 2007. Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 3:1200–1210.

29. Hsueh, W.-C., et al. 2001. Genome-wide scan of obesity in the Old Order Amish. J. Clin. Endocrinol. Metab. 86:1199–1205.

30. Sorkin, J., et al. 2005. Exploring the genetics of longevity in the Old Order Amish. Mech. Ageing Dev. 126:347–350.

31. Post, W., et al. 2007. Associations between genetic variants in the NOS1AP (CAPON) gene and cardiac repolarization in the Old Order Amish. Hum. Hered. 64:214–219.

32. The Caerphilly and Speedwell Collaborative Group. 1984. Caerphilly and Speedwell collaborative heart disease studies. J. Epidemiol. Community Health. 38:259–262.

33. Lawlor, D., Bedford, C., Taylor, M., and Ebrahim, S. 2003. Geographical variation in cardiovascular dis-ease, risk factors, and their control in older women: British Women’s Heart and Health Study. J. Epide-miol. Community Health. 57:134–140.

34. Jørgensen, M.E., et al. 2003. Obesity and central fat pattern among Greenland Inuit and a general pop-ulation of Denmark (Inter99): relationship to met-abolic risk factors. Int. J. Obes. Relat. Metab. Disord. 27:1507–1515.

35. Glümer, C., Jørgensen, T., Borch-Johnsen, K., and Inter99 study. 2003. Prevalences of diabetes and impaired glucose regulation in a Danish population: the Inter99 study. Diabetes Care. 26:2335–2340.

36. Petrolonis, A.J., et al. 2004. Enzymatic character-ization of the pancreatic islet-specific glucose-6-phosphatase-related protein (IGRP). J. Biol. Chem. 279:13976–13983.

37. Shieh, J.-J., Pan, C.-J., Mansfield, B.C., and Chou, J.Y. 2005. In islet-specific glucose-6-phosphatase-related protein, the beta cell antigenic sequence that is targeted in diabetes is not responsible for the loss of phosphohydrolase activity. Diabetologia. 48:1851–1859.

38. van Mil, S.W.C., et al. 2004. Benign recurrent intra-hepatic cholestasis type 2 is caused by mutations in ABCB11. Gastroenterology. 127:379–384.

39. Lang, C., et al. 2007. Mutations and polymorphisms in the bile salt export pump and the multidrug resistance protein 3 assocaited with drug-induced liver injury. Pharmacogenet. Genomics. 17:47–60.

40. Devlin, B., and Roeder, K. 1999. Genomic control for association studies. Biometrics. 55:997–1004.

41. Funk, C., Ponelle, C., Scheuermann, G., and Pantze, M. 2001. Cholestatic potential of troglitazone as a possible factor contributing to troglitazone-induced hepatotoxicity: in vivo and in vitro interac-

90

tion at the canalicular bile salt export pump (Bsep) in the rat. Mol. Pharmacol. 59:627–635.

42. Staels, B., and Kuipers, F. 2007. Bile acid seques-trants and the treatment of type 2 diabetes mellitus. Drugs. 67:1383–1392.

43. Mukherjee, R., Wagar, D., Stephens, T.A., Lee-Chan, E., and Singh, B. 2005. Identification of CD4+ T cell-specific epitopes of islet-specific glucose-6-phosphatase catalytic subunit-related protein: a novel beta cell autoantigen in type 1 diabetes. J. Immunol. 174:5306–5315.

44. Wang, Y., et al. 2007. Deletion of the gene encoding the islet-specific glucose-6-phosphatase catalytic subunit-related protein autoantigen results in a mild metabolic phenotype. Diabetologia. 50:774–778.

45. Arden, S.D., et al. 1999. Molecular cloning of a pan-creatic islet-specific glucose-6-phosphatase catali-ytic subunit-related protein. Diabetes. 48:531–542.

46. Shieh, J.-J., Pan, C.-J., Mansfield, B.C., and Chou, J.Y. 2004. The islet-specific glucose-6-phosphatase-related protein, implicated in diabetes, is a glyco-protein embedded in the endoplasmic reticulum membrane. FEBS Lett. 562:160–164.

47. Dogra, R.S., et al. 2006. Alternative splicing of G6PC2, the gene coding for the islet-specific glu-cose-6-phosphatase catalytic subunit-related pro-tein (IGRP), results in differential expression in human thymus and spleen compared with pancreas. Diabetologia. 49:953–957.

48. Pan, C.-J., Lei, K.-J., Annabi, B., Hemrika, W., and Chou, J.Y. 1998. Transmembrane topology of glu-cose-6-phosphatase. J. Biol. Chem. 273:6144–6148.

49. Khan, A., et al. 1990. Glucose cycling in islets from healthy and diabetic rats. Diabetes. 39:456–459.

50. Khan, A., et al. 1989. Evidence for the presence of glucose cycling in pancreatic islets of the ob/ob mouse. J. Biol. Chem. 264:9732–9733.

51. Khan, A., et al. 1990. Glucose cycling is markedly enhanced in pancreatic islets of obese hyperglycemic mice. Endocrinology. 126:2413–2416.

52. Vaulont, S., Vasseur-Cognet, M., and Kahn, A. 2000. Glucose regulation of gene transcription. J. Biol. Chem. 275:31555–31558.

53. Stone, L.M., Kahn, S.E., Deeb, S.S., Fujimoto, W.Y., and Porte, D., Jr. 1994. Glucokinase gene varia-tions in Japanese-Americans with a family history of NIDDM. Diabetes Care. 17:1480–1483.

54. Stone, L.M., Kahn, S.E., Fujimoto, W.Y., Deeb, S.S., and Porte, D., Jr. 1996. A variation at position -30 of the β-cell glucokinase gene promoter is associ-ated with reduced β-cell function in middle-aged Japanese-American men. Diabetes. 45:422–428.

55. Rose, C.S., et al. 2005. A -30G>A polymorphism of the beta-cell-specific glucokinase promoter associ-ates with hyperglycemia in the general population of whites. Diabetes. 54:3026–3031.

56. Weedon, M.N., et al. 2005. Genetic regulation of birth weight and fasting glucose by a common polymophism in the islet promoter of the glucoki-nase gene. Diabetes. 54:576–581.

57. Malaisse, W.J., Malaisse-Lagae, F., Davies, D.R., Vandercammen, A., and Van Schaftingen, E. 1990. Regulation of glucokinase by a fructose-1-phos-phate-sensitive protein in pancreatic islets. Eur. J. Biochem. 190:539–545.

58. Frayling, T.M., et al. 2007. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science.

316:889–894. 59. Sanna, S., et al. 2008. Common variants in the

GDF5-UQCC region are associated with variation in human height. Nat. Genet. 40:198–203.

60. Willer, C.J., et al. 2008. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat. Genet. 40:161–169.

61. Li, Y., Willer, C.J., Ding, J., Scheet, P., and Abecasis, G.R. 2008. Markov model for rapid haplotyping and genotype imputation in genome wide studies. Nat. Genet. In press.

62. [Anonymous]. 1999. Definition, diagnosis and classification of diabetes mellitus and its compli-cations. Report of a WHO Consultation. WHO. Geneva, Switzerland. www.diabetes.com.au/pdf/who_report.pdf.

63. Livak, K.J. 1999. Allelic discrimination using fluo-rogenic probes and the 5ʹ nuclease assay. Genet. Anal. 14:143–149.

64. Burdick, J.T., Chen, W.M., Abecasis, G.R., and Cheung, V.G. 2006. In silico method for inferring genotypes in pedigrees. Nat. Genet. 38:1002–1004.

65. Chen, W.-M., and Abecasis, G.R. 2007. Family based association tests for genome wide association scans. Am. J. Hum. Genet. 81:913–926.

66. D’Orazio, P., et al. 2005. Approved IFCC recom-mendations on reporting results for blood glucose (abbreviated). Clin. Chem. 51:1573–1576.

67. Blangero, J., and Almasy, L. 1997. Multipoint oligo-genic linkage analysis of quantitative traits. Genet. Epidemiol. 14:959–964.

68. Almasy, L., and Blangero, J. 1998. Multipoint quan-titative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62:1198–1211.

91

SUPPLEMENTARY MATERIAL

Supplemental Table I. Top 10 Independent* Genome-wide Associations with Fasting Glucose.

SNP Chromosome Position

FUSION Stage 1

p-value

SardiNIA

p-value

GWA Meta-analysis

p-value†

rs560887 2 169588655 1.7×10-3 4.4×10-8 2.8×10-10

rs9981885 21 19252508 2.7×10-2 1.2×10-6 3.2×10-7

rs1387153 11 92313476 8.5×10-3 3.6×10-5 1.0×10-6

rs2027281 10 29424184 3.0×10-2 2.1×10-5 1.8×10-6

rs420510 21 19226244 3.2×10-2 2.5×10-5 2.3×10-6

rs693793 6 124830968 8.3×10-3 1.1×10-4 3.1×10-6

rs2214108 7 11428183 4.9×10-2 2.8×10-5 3.9×10-6

rs7251204 19 20558498 3.9×10-2 3.7×10-5 4.0×10-6

rs11122355 1 228319928 0.78 3.9×10-7 5.3×10-6

rs6462079 7 28189067 8.6×10-6 6.8×10-3 5.5×10-6

* Defined as a pair-wise D'<0.8

92

Supplemental Table II. GWA for Fasting Glucose in Non-diabetic Individuals With Meta-

analysis p-value 1.0×10-4 on Chromosome 2.

FUSION Stage 1 SardiNIA

SNP Position** Minor Allele MAF p-value

Minor Allele MAF p-value

GWA Meta-analysis p-value†

rs477224 169575990 C 0.184* 8.9×10-2 C 0.361 3.4×10-4 7.7×10-5

rs13431652 169578922 C 0.301 6.9×10-3 C 0.412 2.4×10-5 5.6×10-7

rs573225 169583048 G 0.307 2.9×10-3 G 0.436 4.9×10-5 5.7×10-7

rs560887 169588655 T 0.305* 1.7×10-3 T 0.372* 4.4×10-8 2.8×10-10

rs563694 169599578 C 0.339* 8.1×10-4 C 0.455* 7.6×10-5 3.5×10-7

rs537183 169600153 C 0.340 8.6×10-4 C 0.455 6.5×10-5 3.1×10-7

rs502570 169600466 A 0.340 9.3×10-4 A 0.455 6.5×10-5 3.3×10-7

rs475612 169602253 T 0.343 1.1×10-3 T 0.451 5.7×10-5 3.3×10-7

rs557462 169603102 T 0.343 1.1×10-3 C 0.455 6.4×10-5 3.6×10-7

rs486981 169607656 A 0.343 1.6×10-3 A 0.469 1.0×10-4 7.9×10-7

rs484066 169607988 A 0.379 1.9×10-2 A 0.370 9.0×10-5 9.0×10-7

rs569805 169608387 A 0.342 1.6×10-3 A 0.469 1.0×10-4 7.9×10-7

rs579060 169608546 G 0.342 1.6×10-3 G 0.469 1.0×10-4 7.9×10-7

rs508506 169610462 A 0.342 1.6×10-3 A 0.469* 1.0×10-4 7.8×10-7

rs494874 169614813 T 0.342 1.6×10-3 T 0.470* 1.1×10-4 8.7×10-7

rs552976 169616945 A 0.342 1.6×10-3 A 0.472* 3.5×10-5 2.5×10-7

rs567074 169619938 T 0.442 8.0×10-2 C 0.446* 4.1×10-4 8.2×10-5

93

rs853789 169626995 A 0.350 1.4×10-3 A 0.412* 2.5×10-7 1.4×10-9

rs853787 169627759 G 0.352 1.1×10-3 G 0.412 2.3×10-7 1.0×10-9

rs862662 169627836 C 0.438 4.5×10-2 A 0.449 2.4×10-4 2.9×10-5

rs853781 169631828 A 0.459 2.9×10-2 G 0.449 2.4×10-4 1.9×10-5

rs853773 169639854 A 0.482 2.7×10-2 G 0.480 3.8×10-6 3.2×10-7

* Based on genotyped data

** Based on NCBI build 35

† Meta-analysis for FUSION stage 1 and SardiNIA

94

Supplemental Table III. Association Between rs563694 and Fasting Glucose in

Non-diabetic Individuals Adjusting for SNPs Associated with Type 2 Diabetes

from Previous GWA Studies*.

Chr

Covariate

SNP Gene

FUSION

p-value

SardiNIA

p-value

Meta

p-value

None 8.0 10-4 7.6 10-5 3.5 10-7

3 rs4402960 IGF2BP2 6.7×10-4 8.1×10-5 3.3 10-7

3 rs1801282 PPARG 8.2×10-4 7.5×10-5 3.5 10-7

6 rs7754840 CDKAL1 8.1×10-4 7.8×10-5 3.6 10-7

8 rs13266634 SLC30A8 1.1×10-3 6.3×10-5 3.6 10-7

9 rs10811661 CDKN2A/2B 1.1×10-3 6.4×10-5 3.7 10-7

10 rs1111875 HHEX 1.1×10-3 7.1×10-5 4.1 10-7

10 rs7903146 TCF7L2 1.1×10-3 6.8×10-5 3.9 10-7

11 rs9300039 Chr 11 intragenic 6.1×10-4 7.5×10-5 2.8 10-7

11 rs5215 KCNJ11 7.1×10-4 7.1×10-5 3.0 10-7

16 rs8050136 FTO 1.1×10-3 7.3×10-5 4.2 10-7

* The direction of the effect of rs563694 on fasting glucose was not altered by

the addition of these SNPs as covariates.

95

Supplemental Table IV. Association Between Fasting Glucose in Non-diabetic

Individuals and SNPs Associated with Type 2 Diabetes from Previous GWA

Studies.

Chr SNP Gene

FUSION

p-value

SardiNIA

p-value

Meta

p-value

3 rs4402960 IGF2BP2 0.453 0.690 0.751

3 rs1801282 PPARG 0.713 0.983 0.160

6 rs7754840 CDKAL1 0.780 0.666 0.805

8 rs13266634 SLC30A8 0.728 0.738 0.029

9 rs10811661 CDKN2A/2B 0.180 0.794 0.060

10 rs1111875 HHEX 0.360 0.538 0.582

10 rs7903146 TCF7L2 0.816 0.530 0.597

11 rs9300039 Chr 11 intragenic 0.298 0.901 0.851

11 rs5215 KCNJ11 0.187 0.596 0.148

16 rs8050136 FTO 0.810 0.541 0.582

96

Chapter 5

Common variant in MTNR1B associated with increased risk of type 2 diabetes and impaired early insulin secretion

Nature Genetics2009;41(1):82-8

97

Common variant in MTNR1B associated withincreased risk of type 2 diabetes and impaired earlyinsulin secretionValeriya Lyssenko1, Cecilia L F Nagorny2, Michael R Erdos3, Nils Wierup4, Anna Jonsson1, Peter Spegel2,Marco Bugliani5, Richa Saxena6,7, Malin Fex8, Nicolo Pulizzi5, Bo Isomaa9, Tiinamaija Tuomi9,10,Peter Nilsson11, Johanna Kuusisto12, Jaakko Tuomilehto13–15, Michael Boehnke16, David Altshuler6,7,Frank Sundler4, Johan G Eriksson17,18, Anne U Jackson16, Markku Laakso12, Piero Marchetti5,Richard M Watanabe19,20, Hindrik Mulder2 & Leif Groop1,10

Genome-wide association studies have shown that variation inMTNR1B (melatonin receptor 1B) is associated with insulinand glucose concentrations. Here we show that the riskgenotype of this SNP predicts future type 2 diabetes (T2D) intwo large prospective studies. Specifically, the risk genotypewas associated with impairment of early insulin response toboth oral and intravenous glucose and with faster deteriorationof insulin secretion over time. We also show that the MTNR1BmRNA is expressed in human islets, and immunocytochemistryconfirms that it is primarily localized in b cells in islets.Nondiabetic individuals carrying the risk allele and individualswith T2D showed increased expression of the receptor inislets. Insulin release from clonal b cells in response toglucose was inhibited in the presence of melatonin. Thesedata suggest that the circulating hormone melatonin, whichis predominantly released from the pineal gland in the brain,is involved in the pathogenesis of T2D. Given the increasedexpression of MTNR1B in individuals at risk of T2D, thepathogenic effects are likely exerted via a direct inhibitoryeffect on b cells. In view of these results, blocking themelatonin ligand-receptor system could be a therapeuticavenue in T2D.

T2D incidence and prevalence are increasing at an alarming rateworldwide. It is well established that T2D is multifactorial and thatmultiple genes and environmental and behavioral factors combine tocause the disease. Recent genome-wide association studies (GWAS)have provided new insights into the nature of these genetic factors1–5.Many of the T2D-associated variants identified in these studies seemto influence the capacity of b cells to cope with increased insulindemands imposed by insulin resistance. One of the GWAS (DiabetesGenetics Inititative; DGI) also provided information on associationwith 18 quantitative traits1, including measures of insulin secretionand action. One of the strongest signals for glucose-stimulated insulinsecretion in the DGI scan emanated from a SNP (rs10830963) inMTNR1B on chromosome 11 (P ¼ 7 � 10�4, rank order 595). Giventhat the melatonin pathway had previously been suggested to beinvolved in pathogenesis of T2D, the MTNR1B gene was a primecandidate gene for T2D. This SNP was also strongly associated(P ¼ 3.2 � 10�50) with elevated fasting glucose concentrations in ameta-analysis of the recent GWAS of T2D6.Melatonin is a circulating hormone predominantly secreted

from the pineal gland, although other endocrine cell systems mayalso synthesize and release this hormone7, which then could exerthitherto unknown autocrine and paracrine effects8. Melatonin is an

Received 11 July; accepted 27 October; published online 7 December 2008; doi:10.1038/ng.288

1Unit of Diabetes and Endocrinology, Department of Clinical Sciences in Malmoe, Lund University Diabetes Centre, University Hospital, Malmoe 20520, Sweden. 2Unitof Molecular Metabolism, Department of Clinical Sciences in Malmoe, Lund University Diabetes Centre, Malmoe 20502, Sweden. 3Genome Technology Branch,National Human Genome Research Institute, Bethesda, Maryland 20892, USA. 4Unit of Neuroendocrine Cell Biology, Department of Experimental Medical Science,Lund University, Lund 22184, Sweden. 5Department of Endocrinology and Metabolism, University of Pisa, Pisa 56124, Italy. 6Program in Medical and PopulationGenetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA. 7Massachusetts General Hospital, Boston,Massachusetts 02114, USA. 8Unit for Diabetes and Celiac Disease, Department of Clinical Sciences in Malmoe, Lund University Diabetes Centre, Malmoe 20502,Sweden. 9Folkhalsan Research Centre, Helsinki 00251, Finland. 10Department of Medicine, Helsinki University Central Hospital, and Research Program of MolecularMedicine, University of Helsinki, Helsinki 00140, Finland. 11Department of Clinical Sciences, Medicine, Lund University, Malmoe 20502, Sweden. 12Department ofMedicine, University of Kuopio and Kuopio University Hospital, Kuopio 70210, Finland. 13Diabetes Unit, Department of Health Promotion and Chronic DiseasePrevention, National Public Health Institute, Helsinki 00300, Finland. 14Department of Public Health, University of Helsinki, Helsinki 00014, Finland. 15SouthOstrobothnia Central Hospital, Senajoki 60220, Finland. 16Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan48109, USA. 17National Public Health Institute, Helsinki 00300, Finland. 18Department of General Practice and Primary Health Care, University of Helsinki, Helsinki00014, Finland. 19Department of Preventive Medicine and 20Department of Physiology and Biophysics, Keck School of Medicine, University of Southern California,Los Angeles, California 90033, USA. Correspondence should be addressed to L.G. ([email protected]).

99

indoleamine formed from tryptophan via acetylation and subsequentmethylation of the neurotransmitter serotonin. It has primarily beenimplicated in the regulation of circadian rhythms, and circulatinglevels of the hormone are high during night and drop duringdaylight7. In fact, it has been proposed that melatonin could beinvolved in a circadian lowering of nocturnal insulin levels9. Effectsof melatonin are mediated by two distinct receptors, MTNR1A andMTNR1B10, which are members of the G protein–coupled receptorfamily, specifically inhibitory G proteins (Gi). Both receptors havebeen found to be expressed in human and rodent islets11, withMTNR1A predominating, especially in glucagon-producing a cells12.There is some evidence that melatonin may exert an effect on insulinsecretion, in that acute effects exerted by cAMP-elevating agents areinhibited by melatonin, whereas prolonged effects of the hormonemay be stimulatory7. Here we provide new evidence that the commonvariant rs10830963 in the MTNR1B gene—or a variant(s) in linkagedisequilibrium with it—increases risk of future T2D by causingimpaired early insulin secretion. Further, we present functional datathat suggest a potential role of the melatonin system, in particular theMTNR1B receptor, in regulation of glucose homeostasis in man.First, we studied whether the MTNR1B rs10830963 SNP predicts

future T2D in 16,061 Swedish (from the Malmoe Preventive Project,MPP) and 2,770 Finnish (from the Botnia study) subjects, 2,201 (2,063+ 138) of whom developed diabetes during a median follow-up periodof 23.5 years (Table 1). The frequency of the risk G allele of SNPrs10830963 was higher in individuals from the MPP study whoconverted to T2D compared to nonconverters (30.2% versus 28.0%,P ¼ 0.002). This yielded a modestly increased risk of 1.12 (95% CI ¼1.04–1.20, P ¼ 0.002). There was no significant difference between

converters and nonconverters in the Botniastudy, but here only 138 individuals developedT2D during a 7-year follow-up period (31.0%versus 29.3%; OR ¼ 1.09, 95% CI ¼ 0.82–1.43, P ¼ 0.56). In the combined analysis ofthe two cohorts, the risk allele was associatedwith a 1.11-fold increased risk of future T2D(95% CI ¼ 1.03–1.18, P ¼ 0.004). Thisrelatively modest risk for future T2D probablyexplains why this SNP was not identified asbeing associated with T2D in previous GWAS(OR ¼ 1.12 (95% CI ¼ 1.04–1.20), P ¼ 0.003

in DIAGRAM). However, the effect on glucose levels seems muchstronger; in nondiabetic individuals from the MPP study,rs10830963[G] carriers had a higher fasting plasma glucose concentra-tion at baseline (CC: 5.38 ± 0.54 mmol/l, CG: 5.44 ± 0.55 mmol/l,GG 5.50 ± 0.55 mmol/l, P ¼ 3 � 10�19), which remained elevatedthroughout the 25-year follow-up period (CC: 5.41 ± 0.54 mmol/l, CG:5.49 ± 0.54 mmol/l, GG 5.55 ± 0.54 mmol/l, P ¼ 2 � 10�31) (Fig. 1a).Next, we examined insulin secretion in 3,300 nondiabetic partici-

pants from the population-based Botnia PPP study. We observed adose-dependent decrease (corrected early insulin response to glucose(CIR): beta ¼ –0.170 ± 0.021, P ¼ 5 � 10�16; disposition index (DI):beta ¼ –0.241 ± 0.022, P ¼ 1 � 10�26) with increasing number ofG alleles of rs10830963 (Table 2 and Fig. 1b,c). These findings werereplicated in the METabolic Syndrome In Men (METSIM) study,where both CIR (beta ¼ –0.143 ± 0.022, P ¼ 1 � 10�10) and DI(beta ¼ –0.128 ± 0.022, P ¼ 9 � 10�9) were associated withrs10830963 in 4,257 subjects.In the Botnia prospective study, 2,328 nondiabetic carriers of

rs10830963[G] showed lower insulin secretion at baseline (CIR:beta ¼ –0.160 ± 0.026, P ¼ 6 � 10�10; DI: beta ¼ –0.171 ± 0.026,P ¼ 9 � 10�11), which was maintained lower throughout the 7-yearfollow-up period (CIR: beta¼ –0.188 ± 0.026, P ¼ 1� 10�12; DI: beta¼ –0.179 ± 0.029, P ¼ 8 � 10�10) (Fig. 1d). Further, rs10830963[G]was also associated with impaired insulin secretion during an intrave-nous glucose tolerance test in 505 nondiabetic individuals from theBotnia study (FPIR: beta ¼ –0.065 ± 0.023, P ¼ 0.004; Fig. 1e).rs10830963[G] was also associated with reduced acute insulinresponse to glucose (AIR: P ¼ 2.2 � 10�6; DI: P ¼ 5.0 � 10�3) in522 nondiabetic individuals from the FUSION study13 (Table 2).

50

100

150

200

250

CIR

(m

U ×

l/m

mol

2 )

300 P < 0.0001

CC CG GG

b

10,000

20,000

30,000

40,000

DI (

mU

3 /L3 )

50,000 P < 0.0001

CC CG GG

c

5.35Baseline Follow-up

5.40

5.45

5.50

CCCGGG5.55

5.60

Fast

ing

gluc

ose

(mm

ol/l)

P < 0.0001a

120

150

180

210

240

270

300

330

PF

IR (

mU

/l)

P = 0.004

CC CG GG

e

0.50

0.51

0.52

0.53

0.54

0.55

0.56

0.57

Inta

ct p

roin

sulin

/insu

lin(m

U/l)

P = 0.005

CC CG GG

f

DI (

mU

3 /l3 )

30,000P < 0.0001

28,000

26,000

24,000

22,000

20,000

18,000

16,000

14,000Baseline Follow-up

dCCCGGG

Table 1 Samples used in this study

Study N (with diabetes) Geographic origin Age (y) BMI (kg/m2)

Malmoe Preventive Project (MPP) 16,061 (2,063) Sweden 45.5 ± 6.9 24.3 ± 3.3

Botnia PPP 3,300 Finland 48.5 ± 15.9 26.1 ± 4.2

Botnia prospective cohort 2,770 (138) Finland 44.9 ± 14.2 25.6 ± 4.1

Helsinki Birth Cohort 1,600 Finland 61.6 ± 3.0 27.1 ± 4.3

FUSION 522 Finland 39.1 ± 12.2 26.0 ± 6.4

METSIM 4,369 Finland 59.3 ± 2.8 26.9 ± 3.8

Data are shown as mean ± s.d.

Figure 1 Insulin secretion according to different

MTNR1B rs10830963 genotypes. (a) Change in

fasting plasma glucose concentrations during

24-year follow-up in nondiabetic subjects

(Malmoe study, N = 13,674). (b) Corrected early

insulin response to glucose (CIR) during OGTT

(Botnia PPP cohort; N = 3,300). (c) Disposition

index (DI) represents early insulin response to

glucose corrected for insulin sensitivity by the

Matsuda index (CIR � ISI, Botnia PPP cohort;

N = 3,300). (d) Change in insulin secretion

(disposition index) over time in nondiabetic

subjects (Botnia prospective cohort, N = 2,328).

(e) Insulin secretion measured as first-phase

insulin response during an IVGTT (Botnia cohort;N = 505). (f) Intact proinsulin-to-insulin ratio

in the fasting state (Helsinki Birth Cohort,

N = 1,600). Bars represent mean ± s.e.m.

Blue lines represent nonrisk and red lines risk

genotype carriers of rs10830963 in MTNR1B.

100

Finally, we examined whether the SNP would influence proinsulinprocessing as reflected in the ratio between proinsulin and insulin in1,600 nondiabetic participants of the Helsinki Birth Cohort Study14.Also here, carriers of the MTNR1B risk genotype had impaired earlyinsulin response to oral glucose (CIR: beta ¼ –0.109 ± 0.027, P ¼ 5 �10�5; DI: beta ¼ –0.122 ± 0.027, P ¼ 8 � 10�6; Table 2). In addition,risk allele carriers had an elevated intact proinsulin-to-insulin ratio(P¼ 0.005; Table 2 and Fig. 1f). However, an increased proinsulin-to-insulin ratio does not a priori imply a specific defect in proinsulinprocessing, as proinsulin concentrations rise under most conditions ofstressed b cells.

The melatonin 1 B receptor (MTNR1B) is expressed in humanislets and in b cells. Using quantitative RT-PCR (Taqman), weobserved that both MTNR1A and MTNR1B were expressed inhuman islets as well as in clonal b cells. In contrast to previousfindings11,12, both receptors were expressed at near equal level inhuman islets. Moreover, islet expression of MTNR1B was con-firmed by immunocytochemistry (Fig. 2). Again, in contrast to aprevious report, where single-cell PCR identified MTNR1A mRNAprimarily in a cells12, we observed expression of MTNR1Bpredominantly in b cells in both human and rodent islets(Fig. 2). MTNR1A was also observed in islets; its expression was

Table 2 Effect of the MTNR1B rs10830963 on insulin secretion in the studied cohorts

Genotypes Additive model

Study Phenotype CC CG GG RA BETA s.e.m. P value

DGI WGAS (OGTT n ¼ 1,020) Age (y) 59 ± 10 59 ± 10 58 ± 10 – 0.74

BMI (kg/m2) 26.5 ± 3.6 26.7 ± 4.0 27.3 ± 3.7 – 0.14

Fasting P-glucose (mmol/l) 5.28 ± 0.53 5.32 ± 0.52 5.38 ± 0.60 0.31 0.045 0.022 0.039

CIR (mU � l/mmol2) 180 ± 360 165 ± 1,912 144 ± 163 –0.166 0.048 7 � 10–4

DI (mU3/l3) 24,036 ± 29,445 20,285 ± 27,763 16,555 ± 22,974 –0.173 0.046 2 � 10–4

Botnia PPP (OGTT n ¼ 3,300) CC CG GG

Age (y) 48.3 ± 16.0 48.5 ± 15.9 49.6 ± 15.9 – 0.38

BMI (kg/m2) 26.12 ± 4.21 26.22 ± 4.22 26.19 ± 3.82 – 0.79

Fasting P-glucose (mmol/l) 5.06 ± 0.54 5.25 ± 0.55 5.28 ± 0.55 0.30 0.134 0.014 2 � 10–22

CIR (mU � l/mmol2) 271 ± 415 205 ± 245 175 ± 134 –0.170 0.021 5 � 10–16

DI (mU3/l3) 44,631 ± 87,537 30,499 ± 49,947 24,316 ± 21,582 –0.241 0.022 1 � 10–26

Botnia prospective (OGTT n ¼ 2,328) Baseline CC CG GG

Age (y) 45.8 ± 13.2 45.1 ± 13.8 45.6 ± 14.2 – 0.52

BMI (kg/m2) 25.5 ± 4.1 25.7 ± 3.7 25.7 ± 3.8 – 0.48


CIR (mU � l/mmol2) 176 ± 183 150 ± 164 129 ± 137 –0.160 0.026 6 � 10–10

DI (mU3/l3) 26,958 ± 34,304 22,340 ± 31,320 18,375 ± 17,416 –0.171 0.026 9 � 10–11

Follow-up CC CG GG

Age (y) 53.8 ± 13.8 52.7 ± 14.3 53.3 ± 14.9 – 0.25

BMI (kg/m2) 26.5 ± 4.1 26.7 ± 4.2 26.7 ± 4.2 – 0.41

Fasting P-glucose (mmol/l) 5.25 ± 0.56 5.34 ± 0.56 5.41 ± 0.61 0.086 0.019 5 � 10–6

CIR (mU � l/mmol2) 234 ± 238 188 ± 192 145 ± 125 –0.188 0.026 1 � 10–12

DI (mU3/l3) 27,508 ± 40,934 20,888 ± 27,012 16,502 ± 16,261 –0.179 0.029 8 � 10–10

CC CG GG

Helsinki Birth Cohort (OGTT n ¼ 1,600) Age (y) 61.6 ± 3.0 61.5 ± 3.0 61.6 ± 3.1 – – 0.96

BMI (kg/m2) 27.0 ± 4.2 27.2 ± 4.4 27.1 ± 4.2 – – 0.53


CIR (mU � l/mmol2) 209 ± 196 175 ± 150 177 ± 188 –0.109 0.027 5 � 10–5

DI (mU3/l3) 19,646 ± 21,504 15,552 ± 15,063 15,699 ± 17,881 –0.122 0.027 8 � 10–6

Intact proinsulin/insulin 0.51 ± 0.26 0.52 ± 0.26 0.55 ± 0.24 0.024 0.009 0.005

METSIM (n ¼ 4,257) Age (y) 59.3 ± 5.8 59.4 ± 5.8 59.1 ± 5.7 0.36 – – –

BMI (kg/m2) 26.9 ± 3.9 26.9 ± 3.7 26.5 ± 3.7 –0.058 0.020 4.3 � 10–3

Fasting P-glucose (mmol/l) 5.6 ± 0.5 5.7 ± 0.5 5.8 ± 0.5 0.165 0.022 9.4 � 10–14

CIR (mU � l/mmol2) 196 ± 212 168 ± 165 152 ± 143 –0.143 0.022 1.3 � 10–10

DI (mU3/l3) 21,554 ± 28,426 17,878 ± 18,235 16,798 ± 16,461 –0.128 0.022 9.8 � 10–9

Botnia (IVGTT n ¼ 505) CC CG GG

FPIR 297 ± 195 259 ± 194 237 ± 139 0.27 –0.065 0.023 0.004

FUSION (FSIGT n ¼ 522) AIR (pM � 8 min) 2,632 ± 1,731 2,064 ± 1,468 1,554 ± 1,092 0.35 –0.316 0.067 2 � 10–6

Data are shown as means ± s.d. CIR, corrected early insulin response to glucose during OGTT; DI, disposition index; FPIR, first-phase insulin response during IVGTT; AIR, acuteinsulin response during frequently sampled intravenous glucose tolerance test (FSIGT); RA, risk allele.

101

less abundant and seemed to be restricted to a population ofperipherally located b cells in human, mouse and rat islets.

Next, we analyzed whether islet expression of MTNR1B, which wenow had established in b-cells, correlated with presence ofrs10830963[G] in the MTNR1B gene as well as with T2D. To thisend, we used both quantitative RT-PCR and microarray. Using RT-PCR, we found that individuals carrying the G allele showed higherexpression of MTNR1B as compared with carriers of the C allele (age-adjusted P ¼ 0.01, Fig. 3a). Notably, this effect was almost exclusivelyseen in individuals older than 45 years (P ¼ 0.001, Fig. 3a insert). Themicroarray experiments (Affymetrix HU 133) were done on isletsisolated from four nondiabetic and four T2D islet donors15. There wasa trend toward higher expression of MTNR1B in T2D than innondiabetic islets (P ¼ 0.20, Supplementary Fig. 1a online), andexpression correlated inversely with glucose-stimulated insulin secre-tion (Supplementary Fig. 1b).

To determine the effects of melatonin on insulin secretion, weacutely incubated clonal b cells (832/13) at low and high glucoseconcentrations in the presence of 0.1 mM melatonin. Addition ofmelatonin exerted a clear inhibitory effect on insulin secretionprovoked by glucose (Fig. 3b).The present findings provide strong support for a role of melatonin

and its receptor MTNR1B in the pathogenesis of T2D. A commonvariant in the MTNR1B receptor was associated with an increase infasting glucose over time and predicted future T2D, most likelythrough impairment of insulin secretion from the pancreatic b-cellfunction7. Notably, this effect became more pronounced with increas-ing age, most likely as a consequence of the increased demandsimposed by increased age-related insulin resistance. This effect canbe understood in light of what is known about the function ofmelatonin in islets based on previous studies as well as our presentresults. The MTNR1B is coupled to an inhibitory G protein10.Activation of MTNR1B by melatonin would therefore block activationof adenylate cyclase, which is the predominant mode of action forincretin hormones, such as GLP-1 and glucose-dependent insulino-tropic polypeptide (GIP), both of which raise intracellular cAMP.There is also evidence supporting that glucose stimulation of the b cell

by itself leads to a rise in intracellular cAMP. Indeed, it has previouslybeen observed that addition of melatonin blocks cAMP formation in bcells16. Here, we confirmed previous observations, although discrepantresults have been reported12, that melatonin acutely blocks glucose-induced insulin secretion7. Thus, in a situation where expression ofMTNR1B is increased, it could be anticipated that cellular cAMP levelswill be lower. Hence, the potentiating effect that this nucleotide exertson insulin secretion, via mechanisms both dependent on and inde-pendent of protein kinase A, would be diminished, leading toimpaired insulin secretion. This potential pathogenic situationwould be further aggravated if melatonin levels are elevated. In fact,this seems to be the case: studies have reported that the circadianrhythm in melatonin secretion is perturbed in T2D17. It hasbeen suggested that secretion of the hormone is elevated during theday, when it normally should be low, which could lead to reducedinsulin secretion.There are therapeutic implications of our findings. First, if melato-

nin has a negative role in the development of T2D, antagonists of thereceptors targeted to b cells could be of utility. Second, individuals withthe risk profile conferred by the MTNR1B rs10830963 SNP may be lessresponsive to treatment with GLP-1 analogs as well as inhibitors ofGLP-1 degradation (DPP-IV inhibitors). Identifying these individualsmay allow tailoring of a more precise therapy in T2D.Our findings lend support to earlier reports of a role of the

melatonin system for islet function and also provide new insightsinto the mechanisms by which the system may play a role in thepathogenesis of T2D. Interfering with its action may be a newtherapeutic avenue in T2D.

METHODSStudy populations. In the Malmoe Preventive Project (MPP), 33,346 Swedish

subjects (22,444 men and 10,902 women; mean age 49 years, 24.5% with

0.5

1.0

1.5

2.0

2.5

MT

NR

1B e

xpre

ssio

n

3.0

3.5

4.0

4.5

0.5CC CG GG

CC CG GG

1.52.5

4.53.5

MT

NR

1Bex

pres

sion

Age > 45 yrsP = 0.001

5.5

0

2.8

16.7

16.7

+ 0

.1 μM

mela

tonin

5

10

15

20

Insu

lin n

g/m

g pr

otei

n/h

25

30

35

40

45

*a b

Figure 3 Expression of MTNR1B in human pancreatic islets. (a) The

MTNR1B mRNA levels were higher in risk GG genotype carriers (total

n ¼ 51, CC ¼ 21, CG ¼ 25, GG ¼ 5; nonadjusted P ¼ 0.25, age-adjusted

P ¼ 0.01). The insert graph shows expression of the MTNR1B mRNA levels

in the individuals above mean age of 45 years (total n ¼ 25, CC ¼ 10,

CG ¼ 13, GG ¼ 2; P ¼ 0.001): the MTNR1B mRNA levels were higher in

risk GG genotype carriers. (b) Insulin secretion in INS-1 832/13 clonal

b-cells in response to stimulation with 2.8 mM (gray bar) and 16.7 mM

glucose (white bar) in with the presence and absence of 0.1 mM melatonin(black bar). Individual experiments were done in triplicate (n ¼ 7,

*P o 0.037). Bars represent mean ± s.e.m.

Mouse

MTNR1B

Insu

linMerge

d

Rat Human

Figure 2 Colocalization of MTNR1B and insulin protein in mouse, rat and

human pancreatic islets. Scale bar, 50 mm.

102

impaired fasting (IFG) and/or impaired glucose tolerance (IGT)) from the city

of Malmoe in southern Sweden participated in a health screening during 1974–

1992 (ref. 18). All individuals underwent a physical examination and blood was

drawn for measurements of fasting blood glucose and lipid concentrations. In

addition, 18,900 consecutively enrolled persons also had an oral glucose

tolerance test (OGTT). Information on lifestyle factors and medical history

was obtained by questionnaire. Of individuals participating in the initial

screening 4,931 are deceased and 551 are lost from follow-up. Of the eligible

individuals, 25,000 were invited to a rescreening visit during 2002–2006, which

included a physical examination and fasting blood samples for measurements

of plasma glucose and lipids. Of the invited subjects, 17,284 persons partici-

pated in the rescreening. Of them 1,223 were excluded because of lacking

information or DNA (or T2D at baseline)19. Thereby, 16,061 nondiabetic

subjects, 2,063 of whom developed T2D, were included in the current analyses.

Diagnosis of diabetes was confirmed from subject records or on the basis of a

fasting plasma glucose concentration greater than 7.0 mmol/l.

The Botnia study started in 1990 at the west coast of Finland aiming at

identification of genes’ increasing susceptibility to T2D in members from

families with T2D. The prospective part included 2,770 nondiabetic family

members and/or their spouses (1,263 men and 1,507 women, mean age

45 years), 138 of whom developed T2D during a 7.7 year (median) follow-

up period19–21. All subjects were given information about exercise and healthy

diet and exposed at 2- to 3-year intervals to a new OGTT.

Prevalence, Prediction and Prevention of T2D (PPP Botnia) study is a

population-based study in the Botnia region which included approximately

10% of the population aged 18–74 years (mean age 51 ± 17 years.) Diagnosis of

diabetes was confirmed from subject records or on the basis of a fasting plasma

glucose concentration greater than 7.0 mmol/l and/or 2 h glucose greater than

11.1 mmol/l. Of the nondiabetic individuals, 2,328 also had serum insulin

concentrations measured at baseline and during follow-up.

The Finland–United States Investigation of Non-insulin-dependent Diabetes

Mellitus Genetics (FUSION) study has been described in detail2,13. For this

study 578 nondiabetic spouses or offspring were included in the study

of insulin response to intravenous glucose using tolbutamide-modified fre-

quently sampled intravenous glucose tolerance tests (FSIGTs)22,23 and ana-

lyzed by the Minimal Model method24 to derive quantitative measures of

insulin sensitivity (SI) and glucose effectiveness (SG). Insulin secretion

was assessed as the acute insulin response to glucose (AIR) as described by

Ward et al., and beta-cell function was assessed using the disposition index

(DI ¼ SI � AIR)25.

The Helsinki Birth Cohort Study (HBCS) has been previously described. In

the present study, 1,600 nondiabetic subjects (698 men and 902 women, mean

age 62 ± 3 years) were included14. In 2001–2004 all subjects participated in a

clinical examination, including a standard 75 g OGTT. Intact proinsulin

concentration was measured at 0 min and the fasting proinsulin/insulin ratio

(PI/I) was calculated.

The METabolic Syndrome In Men (METSIM) study includes men aged

45–70 years, randomly selected from the population of the town of Kuopio,

Eastern Finland, Finland (population 95,000). The present analysis is based on

the first 4,386 nondiabetic subjects examined for METSIM with available

OGTT data. Samples for the OGTT were obtained at fasting and at 30 and

120 min postload. The CIR and ISI were calculated from OGTT glucose and

insulin data as described below.

All participants from the different studies gave informed consent and the

local ethics committees approved the protocols.

Measurements. Weight, height and waist and hip circumferences were mea-

sured as previously reported18,19. In the MPP cohort at baseline, blood samples

were drawn at 0, 40 and 120 min of the 75 g OGTT for measurements of blood

glucose and serum insulin concentrations, and fasting samples were drawn at

the follow-up visit for measurement of plasma glucose and lipid concentrations

using standard techniques. In the Botnia study, blood samples were drawn at –

10, 0, 30, 60 and 120 min of the OGTT. Insulin sensitivity index (ISI) from the

OGTT was calculated as 10,000/O((fasting plasma glucose � fasting plasma

insulin)(mean OGTTglucose � mean OGTTinsulin))26. The basal insulin resis-

tance index (HOMA) was calculated from fasting insulin and glucose con-

centrations (see URLs section below). b-cell function was assessed as corrected

incremental insulin response during OGTT (CIR¼ (100� insulin at 30 min or

40 min in MPP))/((glucose at 30 min or 40 min in MPP)� (glucose 30 min or

40 min in MPP – 3.89))27 or as disposition index, that is, insulin secretion

adjusted for insulin sensitivity (CIR � ISI).

Plasma glucose was measured by hexokinase (MPP, FUSION), glucose

oxidase (Botnia, FUSION, METSIM) methods. Plasma insulin concentrations

were measured by an ELISA assay (Dako, Cambridgeshire; Botnia study), by a

local radioimmunoassay (MPP), by radioimmunoassay using dextran-charcoal

separation (FUSION) or by a commercial double-antibody solid-phase

radioimmunoassay (METSIM).

Genotyping. In the DGI and FUSION GWAS, genotyping was done using

Affymetrix 500K chip array1 and Illumina HumanHap300 BeadChip Version

1.0 (ref. 2). In the FUSION and METSIM studies, SNP rs10830963 was

genotyped by Sequenom iPlex gold SBE (Sequenom); in all other replication

studies rs10830963 was genotyped by an allelic discrimination assay-by-design

method on ABI 7900 (Applied Biosystems). Genotypes were in Hardy-

Weinberg equilibrium. In MPP and Botnia, we obtained an average genotyping

success rate of 495% and the concordance rate was 98.7%, using two different

methods (allelic discrimination on ABI7900 and Affymetrix). Replication

genotyping for FUSION and METSIM studies was done using Sequenom iPlex

gold SBE (Sequenom).

Immunocytochemistry. For histochemical analysis pancreatic specimens were

dissected, fixed overnight in Stefanini’s solution (2% paraformaldehyde and

0.2% picric acid in 0.1 M phosphate buffered saline, pH 7.2), rinsed thoroughly

in Tyrode solution containing 10% sucrose and frozen on dry ice. Sections

(10 mm thickness) were cut and thaw-mounted on slides. Antibodies were

diluted in PBS (pH 7.2) containing 0.25% BSA and 0.25% Triton X-100.

Sections were incubated with primary antibodies (goat antibody to melatonin

receptor 1B (code sc-13177; dilution 1:400, Santa Cruz Biotechnology)); goat

antibody to melatonin receptor 1A (code sc-13186, dilution 1:400, Santa Cruz

Biotechnology) and guinea pig antibody to proinsulin (code 9003; dilution

1:2,560; EuroDiagnostica) overnight at 4 1C in moisturizing chambers. The

sections were rinsed in PBS with Triton X-100 for 2 � 10 min. Thereafter

secondary antibodies with specificity for goat or guinea pig IgG, and coupled

to either fluorescein isothiocyanate (FITC) or Texas-Red (Jackson), were

applied on the sections. Incubation was for 1h at room temperature in

moisturizing chambers. The sections were again rinsed in PBS with Triton

X-100 for 2 � 10 min and then mounted in PBS:glycerol, 1:1. The specificity

of immunostaining was tested using primary antisera pre-absorbed with

homologous antigen (100 mg of peptide per ml antiserum at working

dilution). Immunofluorescence was examined in an epifluorescence micro-

scope (Olympus, BX60). By changing filters the location of the different

secondary antibodies in double staining was determined. Images were captured

with a digital camera (Nikon DS-2Mv)28.

Gene expression using real-time PCR. Total RNA was isolated with the

AllPrep DNA/RNA Mini Kit (Qiagen) at the Human Tissue Facility of Lund

University Diabetes Center (LUDC); or by RNeasy protect mini kit (Qiagen) as

previously described15 at the Joslin Islet Cell Resource Center (Joslin); or by

Trizol (Invitrogen) and further purification using RNeasy mini kit (Qiagen) at

the National Human Genome Research Institute (NHGR). RNA quantity was

determined by evaluating the absorbance at 260 and 280 nm in a Perkin-Elmer

spectrophotometer (Waltham), and quality was assessed by running samples on

Agilent 2100 Bioanalyzer (Agilent Technologies) at Joslin. cDNA was synthe-

sized from 0.4 mg total RNA using RevertAid First Strand cDNA Synthesis Kit

(Fermentas Life Sciences) (at LUDC); 0.5 mg total RNA using the High Capacity

RNA-to-cDNA Kit (Applied Biosystems) (at NHGR); and 1 mg total RNA using

iScript cDNA synthesis kit (Biorad) (at Joslin). TaqMan gene expression

assays were purchased from Applied Biosystems for the various target genes:

Hs00173794_m1 directed against human MTNR1B and HPRT (hypoxanthine-

guanine phosphoribosyl transferase) (at LUDC and NHGR) and PPIA (cyclo-

philin) (at Joslin), which served as endogenous control gene. Q-PCR reactions

were done on the ABI 7900HT (Applied Biosystems) at LUDC and NHGR by

mixing 2� TaqMan Universal Master Mix, 20� TaqMan Gene Expression

Assays, nuclease-free water and cDNA for a final reaction volume of 10 ml(at LUDC), as described earlier29 (at Joslin). The relative quantity of MTNR1B

103

mRNA was calculated using the comparative threshold method (Ct-method)

(at LUDC and NHGR). All experiments were performed in triplicate.

For microarray experiments, 100 ng total RNA was subjected to two rounds

of amplification (GeneChip Two-Cycle Kit, Affymetrix), and biotinylated RNA

was generated using GeneChip IVT Labeling Kit (Affymetrix). RNA products

were fragmented and hybridized to GeneChip Human HG U 133A Array

(Affymetrix). The array data were normalized and analyzed using DNA-Chip

Analyzer (dChip) software (see URLs section below, last accessed in January

2008) that assesses the standard errors for the expression indexes and calculates

confidence intervals for fold changes (Joslin, NHGR).

Effect of melatonin on insulin secretion. To determine the effects of

melatonin on insulin secretion, we incubated the clonal b cells from the line

832/13 with 0.1 mM melatonin for 1 h. Then, the amount of released insulin

into the buffer was determined by radioimmunoassay.

Statistical analyses. Differences in expression levels were tested by analysis of

variance or nonparametric Mann-Whitney tests. The odds ratios for risk of

developing T2D were calculated using logistic regression analyses adjusted for

age at participation and time to last follow-up, body mass index and sex.

Multivariate linear regression analyses were used to test genotype–phenotype

correlations adjusted for age, sex, body mass index (apart from body mass index

phenotype) and for within-family dependence. Non-normally distributed vari-

ables were log-transformed before analysis. Analysis of FUSION FSIGT and

METSIM OGTT data was carried out using a regression framework in which

regression coefficients were estimated in the context of a variance component

model to account for relatedness among individuals30. Trait values for both

studies were adjusted for age and age squared. For FUSION data sex was

included as an additional covariate. Analyses were carried out in nondiabetic

individuals excluding those known to be taking medications that directly affect

glucose or insulin concentrations. Covariate-adjusted trait values were trans-

formed to approximate univariate normality by applying an inverse normal

scores transformation; the scores were ranked, ranks were transformed into

quantiles and quantiles were converted to normal deviates.

All statistical analyses were performed using SPSS version 14.0, PLINK, Stata

(StataCorp) or MERLIN30.

URLs. Diabetes Trial Unit, http://www.dtu.ox.ac.uk/, dChip software, http://

biosun1.harvard.edu/complab/dchip/; PLINK, http://pngu.mgh.harvard.edu/

~purcell/plink/.

Note: Supplementary information is available on the Nature Genetics website.

ACKNOWLEDGMENTSThe DGI study was supported by a grant from Novartis.Studies in Malmoe were supported by grants from the Swedish Research Council,including a Linne grant (No. 31475113580), the Diabetes Programme at LundUniversity, the Pahlsson Foundation, the Heart and Lung Foundation, theWallenberg Foundation, the Swedish Diabetes Research Society, the CrafoordFoundation, Swedish Medical Society, Swedish Royal Physiographic Society, aNordic Centre of Excellence Grant in Disease Genetics, the Finnish DiabetesResearch Society, the Sigrid Juselius Foundation, Folkhalsan Research Foundation,Novo Nordisk Foundation, the European Network of Genomic and GeneticEpidemiology (ENGAGE), the Wallenberg Foundation, the European Foundationfor the Study of Diabetes (EFSD) and the Human Tissue facility at the LundUniversity Diabetes Center. Studies in human islets were supported in part by theItalian Ministry of University and Research (PRIN 2007-2008) and the EuropeanCommunity (LSHM-CT-2006-518153).Pancreatic islets at US National Institutes of Health were obtained through theICR Basic Science Islet Distribution Program (City of Hope Hospital, JoslinDiabetes Center, Northwestern University, Southern California Islet Consortium,University of Alabama Birmingham, University of Illinois, University of Miami,University of Minnesota, University of Pennsylvania, University of Wisconsinand Washington University), the Juvenile Diabetes Research Foundation IsletResources (Washington University) and the National Disease ResourceInterchange (NDRI).The FUSION study would like to thank the many research volunteers whogenerously participated in the various studies represented in FUSION. Wealso thank A.J. Swift, M. Morken, P.S. Chines and N. Narisu for genotyingand informatics support. Support for FUSION was provided by the following:NIH grant DK062370 (M. Boehnke), American Diabetes Association research

grant 1-05-RA-140 (R.M.W.), DK072193 (K.L. Mohlke) and National HumanGenome Research Institute intramural project number 1 Z01 HG000024(F.S. Collins). The METSIM study was supported by Academy of Finland grant124243 (M.L.).

AUTHOR CONTRIBUTIONSV.L.: DGI GWAS, data analysis and draft of the report. C.L.F.N., M.R.E.:in vitro expression experiments and analysis, and draft of the report. N.W.:immunocytochemistry. A.J.: genotyping and data analysis. P.S.: in vitro expressionexperiments. M. Bugliani: microarray and human islets experiments. R.S.: DGIGWAS analysis. M.F.: in vitro physiology. N.P.: genotyping. B.I., T.T.: phenotypingin the Botnia study. P.N.: phenotyping in the Malmoe study. J.K.: data analysis inMETSIM study. J.T.: phenotyping in the FUSION study. M. Boehnke: PI of theFUSION study. D.A.: PI of the DGI study. F.S.: immunocytochemistry. J.G.E.:phenotyping in the Helsinki Birth Cohort Study. A.U.J.: FUSION GWAS anddata analysis. M.L.: PI of the METSIM study. P.M.: microarray and human isletsexperiments. R.M.W.: FUSION GWAS analysis. H.M.: design and supervision ofin vitro study experiments and draft of the report. L.G. designed and supervisedall parts of the study and drafted the report. All researchers took part in therevision of the report and approved the final version.

Published online at http://www.nature.com/naturegenetics/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University &Novartis Institutes of BioMedical Research, et al. Genome-wide association analysisidentifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336(2007).

2. Scott, L.J. et al. A genome-wide association study of type 2 diabetes in Finns detectsmultiple susceptibility variants. Science 316, 1341–1345 (2007).

3. Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2diabetes. Nature 445, 881–885 (2007).

4. Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scalereplication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet.40, 638–645 (2008).

5. Zeggini, E. et al. Replication of genome-wide association signals in UK samples revealsrisk loci for type 2 diabetes. Science 316, 1336–1341 (2007).

6. Prokopenko, I. et al. Variants in MTNR1B influence fasting glucose levels and risk oftype 2 diabetes. Nat. Genet. advance online publication, doi:10.1038/ng.290(7 December 2008).

7. Peschke, E. Melatonin, endocrine pancreas and diabetes. J. Pineal Res. 44, 26–40(2008).

8. Kvetnoy, I.M. Extrapineal melatonin: location and role within diffuse neuroendocrinesystem. Histochem. J. 31, 1–12 (1999).

9. Boden, G., Ruiz, J., Urbain, J.L. & Chen, X. Evidence for a circadian rhythm of insulinsecretion. Am. J. Physiol. 271, E246–E252 (1996).

10. Pandi-Perumal, S.R. et al. Physiological effects of melatonin: role of melatoninreceptors and signal transduction pathways. Prog. Neurobiol. 85, 335–353(2008).

11. Muhlbauer, E. & Peschke, E. Evidence for the expression of both the MT1- and inaddition, the MT2-melatonin receptor, in the rat pancreas, islet and beta-cell. J. PinealRes. 42, 105–106 (2007).

12. Ramracheya, R.D. et al. Function and expression of melatonin receptors on humanpancreatic islets. J. Pineal Res. 44, 273–279 (2008).

13. Valle, T. et al. Mapping genes for NIDDM. Design of the Finland-United StatesInvestigation of NIDDM Genetics (FUSION) Study. Diabetes Care 21, 949–958(1998).

14. Eriksson, J.G., Osmond, C., Kajantie, E., Forsen, T.J. & Barker, D.J. Patterns of growthamong children who later develop type 2 diabetes or its risk factors. Diabetologia 49,2853–2858 (2006).

15. Marselli, L. et al. Gene expression of purified beta-cell tissue obtained from humanpancreas with laser capture microdissection. J. Clin. Endocrinol. Metab. 93,1046–1053 (2008).

16. Peschke, E., Bach, A.G. & Muhlbauer, E. Parallel signaling pathways of melatonin inthe pancreatic beta-cell. J. Pineal Res. 40, 184–191 (2006).

17. Peschke, E. et al. Melatonin and type 2 diabetes - a possible link? J. Pineal Res. 42,350–358 (2007).

18. Berglund, G. et al. Long-term outcome of the Malmo preventive project: mortality andcardiovascular morbidity. J. Intern. Med. 247, 19–29 (2000).

19. Lyssenko, V. et al. Clinical risk factors, DNA variants, and the development of type 2diabetes. N. Engl. J. Med. 359, 2220–2232 (2008).

20. Lyssenko, V. et al. Genetic prediction of future type 2 diabetes. PLoS Med. 2, e345(2005).

21. Lyssenko, V. et al. Predictors of and longitudinal changes in insulin sensitivity andsecretion preceding onset of type 2 diabetes. Diabetes 54, 166–174 (2005).

22. Steil, G.M., Volund, A., Kahn, S.E. & Bergman, R.N. Reduced sample number forcalculation of insulin sensitivity and glucose effectiveness from the minimal model.Suitability for use in population studies. Diabetes 42, 250–256 (1993).

104

23. Yang, Y.J., Youn, J.H. & Bergman, R.N. Modified protocols improve insulin sensitivityestimation using the minimal model. Am. J. Physiol. 253, E595–E602 (1987).

24. Bergman, R.N., Ider, Y.Z., Bowden, C.R. & Cobelli, C. Quantitative estimation of insulinsensitivity. Am. J. Physiol. 236, E667–E677 (1979).

25. Ward, W.K., Bolgiano, D.C., McKnight, B., Halter, J.B. & Porte, D. Jr. Diminished B cellsecretory capacity in patients with noninsulin-dependent diabetes mellitus. J. Clin.Invest. 74, 1318–1328 (1984).

26. Matsuda, M. & DeFronzo, R.A. Insulin sensitivity indices obtained from oral glucosetolerance testing: comparison with the euglycemic insulin clamp. Diabetes Care 22,1462–1470 (1999).

27. Hanson, R.L. et al. Evaluation of simple indices of insulin sensitivity and insulinsecretion for use in epidemiologic studies. Am. J. Epidemiol. 151, 190–198(2000).

28. Wierup, N., Bjorkqvist, M., Kuhar, M.J., Mulder, H. & Sundler, F. CART regulates islethormone secretion and is expressed in the beta-cells of type 2 diabetic rats. Diabetes55, 305–311 (2006).

29. Del Guerra, S. et al. Functional and molecular defects of pancreatic islets in humantype 2 diabetes. Diabetes 54, 727–735 (2005).

30. Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide associa-tion scans. Am. J. Hum. Genet. 81, 913–926 (2007).

105

Supplementary Information

A common variant in the melatonin receptor gene (MTNR1B) is associated with increased risk of future type 2 diabetes and impaired early insulin secretion

Valeriya Lyssenko, Cecilia L.F. Nagorny, Michael R. Erdos, Nils Wierup, Anna Jonsson, Peter Spégel, Marco Bugliani, Richa Saxena, Malin Fex, Nicolo Pulizzi, Bo Isomaa, Tiinamaija Tuomi, Peter Nilsson, Johanna Kuusisto, Jaakko Tuomilehto, Michael Boehnke, David Altshuler, Frank Sundler, Johan G. Eriksson, Anne U. Jackson, Markku Laakso, Piero Marchetti, Richard M. Watanabe, Hindrik Mulder and Leif Groop

106

Fig.

S1

Expr

essi

on o

f MTN

R1B

in h

uman

pan

crea

tic is

lets

. (A

) The

MTN

R1B

mR

NA

leve

ls

in h

uman

pan

crea

tic is

lets

was

50%

hig

her i

n T2

D (n

=4, b

lack

bar

) com

pare

d to

con

trols

in th

e m

icro

arra

y st

udie

s16

(n=4

, whi

te b

ar).

Bar

s re

pres

ent m

ean

±SE

. (B

) Cor

rela

tion

betw

een

the

MTN

R1B

mR

NA

leve

ls a

nd in

sulin

rele

ase

at 1

6.7

mM

gluc

ose15

.

AB

107

Chapter 6

Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci

Cell Metabolism2010;12(5):443-55

109

Global Epigenomic Analysis of Primary HumanPancreatic Islets Provides Insightsinto Type 2 Diabetes Susceptibility LociMichael L. Stitzel,1,6 Praveen Sethupathy,1,6 Daniel S. Pearson,1 Peter S. Chines,1 Lingyun Song,3 Michael R. Erdos,1

Ryan Welch,5 Stephen C.J. Parker,1 Alan P. Boyle,3 Laura J. Scott,5 NISC Comparative Sequencing Program,1,2

Elliott H. Margulies,1 Michael Boehnke,5 Terrence S. Furey,3 Gregory E. Crawford,3,4 and Francis S. Collins1,*1Genome Technology Branch2NIH Intramural Sequencing Center (NISC)

National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA3Institute for Genome Sciences & Policy4Department of Pediatrics, Division of Medical GeneticsDuke University, Durham, NC 27708, USA5Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA6These authors contributed equally to this work

*Correspondence: [email protected] 10.1016/j.cmet.2010.09.012

SUMMARY

Identifying cis-regulatory elements is important tounderstanding how human pancreatic islets modu-late gene expression in physiologic or pathophysio-logic (e.g., diabetic) conditions. We conductedgenome-wide analysis of DNase I hypersensitivesites, histone H3 lysine methylation modifications(K4me1, K4me3, K79me2), and CCCTC factor(CTCF) binding in human islets. This identified�18,000 putative promoters (several hundredunannotated and islet-active). Surprisingly, activepromotermodificationswere absent at genes encod-ing islet-specific hormones, suggesting a distinctregulatory mechanism. Of 34,039 distal (nonpro-moter) regulatory elements, 47% are islet uniqueand 22% are CTCF bound. In the 18 type 2 diabetes(T2D)-associated loci, we identified 118 putativeregulatory elements and confirmed enhancer activityfor 12 of 33 tested. Among six regulatory elementsharboring T2D-associated variants, two exhibitsignificant allele-specific differences in activity.These findings present a global snapshot of thehuman islet epigenome and should provide func-tional context for noncoding variants emerging fromgenetic studies of T2D and other islet disorders.

INTRODUCTION

Type 2 diabetes (T2D) is a complex metabolic disorder that

accounts for 85%–95% of all cases of diabetes and afflicts

hundreds of millions of people worldwide (http://www.

diabetesatlas.org/content/diabetes). It is a leading cause of

substantial morbidity and is characterized by defects in insulin

sensitivity and secretion resulting from the progressive dysfunc-

tion and loss of b cells in the pancreatic islets of Langerhans

(Butler et al., 2007; Muoio and Newgard, 2008). Both genetic

predisposition and environmental factors contribute to these islet

defects. Islets constitute 1%–2% of human pancreatic mass

(Joslin and Kahn, 2005) and are composed of five endocrine

cell types that secrete different hormones: a cells (glucagon),

b cells (insulin), d cells (somatostatin), PP cells (pancreatic poly-

peptide Y), and 3 cells (ghrelin). These cells sense changes in

blood glucose concentration and respond by modulating the

activity of multiple pathways, including insulin and glucagon

secretion, to maintain glucose homeostasis (Joslin and Kahn,

2005). Several key transcription factors (TFs) that regulate these

responses are known (Oliver-Krasinski and Stoffers, 2008).

However, efforts to identify cis-regulatory elements upon which

these and other factors act have been restricted primarily to

promoter regions at specific loci (e.g., INS, PDX1) (Brink, 2003;

Ohneda et al., 2000).

Results from genome-wide association studies (GWAS) of

type 1 diabetes (T1D) (Barrett et al., 2009), T2D (reviewed in

Prokopenko et al., 2008), and related metabolic traits (Dupuis

et al., 2010; Ingelsson et al., 2010; Prokopenko et al., 2009)

suggest that genetic variation in cis-regulatory elements may

play an important role in b cell (dys)function and diabetes

susceptibility (De Silva and Frayling, 2010). Of the 18 most

strongly associated single-nucleotide polymorphisms (SNPs) in

each of the T2D-associated loci, only 3 are missense variants;

the remaining are noncoding (Prokopenko et al., 2008). Further-

more, there is evidence for allele-specific effects of two T2D-

associated SNPs on the islet expression level of nearby genes

(TCF7L2 [Lyssenko et al., 2007] and MTNR1B [Lyssenko et al.,

2009]). However, the dearth of annotation of functional regula-

tory elements has limited the capacity to investigate the role of

regulatory variation in complex diseases such as T2D.

Recent characterization of histone modifications and DNase

hypersensitivity in cultured cells has identified chromatin signa-

tures predictive of regulatory elements and actively transcribed

regions (Boyle et al., 2008; Guenther et al., 2007; Heintzman

et al., 2007). The data generated so far suggest that regulatory

111

0

0.2

0.4

0.6

0.8

1E

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

GM12878 K562 HeLa-S3 HepG2 Union all 4

All DHS peaks (n = 101326)DHS peaks at RefSeq TSSs (n = 11829)DHS peaks not at RefSeq TSSs (n = 89497)

A B

C D

0

100

200

300

400

500

600

700

TSSAll Promoter

Intergenic

Exonic

Intronic

0

20

40

60

80

100

120

RefSeq TSSPromoterIntergenicExonicIntronic

DHS peaks (n = 101,326)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sequence-based phastCons elementsTopography-informed Chai elements

All TSSPromoter

Intergenic

Exonic

Intronic

All TSSPromoter

Intergenic

Exonic

Intronic

Frac

tion

of D

HS

peak

s

Aver

age

peak

leng

th (n

t)

Aver

age

peak

inte

nsity

(-lo

g[p

valu

e])

Frac

tion

of p

eaks

uni

que

to is

let

Frac

tion

over

lap

with

isle

t DH

S pe

aks

2 3 4 5 6 7 81

0.5

0.3

0.4

0.2

0.1

0.0

log10 [distance to nearest d-DHS peak (nt)]

**

**

F

Stringent islet-FAIRE peaks (n = 9887)

Den

sity

11%

12%

35%

33%

9%

Figure 1. Analysis of DNase I Hypersensitive Sites in the Islet Genome

(A) Distribution of DNase I-hypersensitive (DHS) peaks across five genomic annotation sets. ‘‘Promoter’’ denotes proximal regions 5 kb upstream of RefSeq tran-

scription start sites (TSSs) that do not overlap the TSS. ‘‘Exonic’’ represents regions that overlap at least one base with an exon.

112

element location and usage vary substantially among cell types

(Heintzman et al., 2009; Xi et al., 2007). Also, extensive chromatin

profiling has been conducted in very few human primary

tissues to date (Bhandare et al., 2010). In this study, we describe

a comprehensive genome-wide epigenomic map of unstimu-

lated human pancreatic islets. Using DNase- and ChIP-seq

approaches, we identified DNase I-hypersensitive sites that

mark regions of open chromatin, loci enriched for active histone

H3 lysine methylation modifications (H3K4me1, H3K4me3,

and H3K79me2), and binding sites for the insulator CCCTC-

binding factor (CTCF). These profiles provide a detailed

chromatin snapshot of regulatory elements and actively tran-

scribed units in the islet. Moreover, they identify regulatory

elements harboring T2D-associated variants in 6/18 loci. These

data provide a valuable resource for understanding and investi-

gating cis-regulation in the human islet and for discovering regu-

latory elements that may play an important role in diabetes

susceptibility.

RESULTS

Genome-wide Characterization of Open Chromatinin the Human Pancreatic IsletActive regulatory elements reside in open chromatin regions

hypersensitive to DNase I digestion (ENCODE Project Consor-

tium, 2007; Boyle et al., 2008; Crawford et al., 2004; Hesselberth

et al., 2009; Sabo et al., 2004). To identify all DNase-hypersensi-

tive sites (DHS) in the human pancreatic islet, we performed

DNase-seq (Boyle et al., 2008) and identified regions of the

genome with significant enrichment of sequence reads using

the MACS algorithm (Zhang et al., 2008) (Experimental Proce-

dures). This approach identified 101,326 human islet DHS peaks

(Table S1) covering �27 million bases (�1% of the human

genome). Consistent with observations in CD4+ T cells (Boyle

et al., 2008), a substantive fraction of islet DHS peaks (23%,

n = 23,408) span annotated RefSeq transcription start sites

(TSS) or are within regions 5 kb upstream (Promoter), but the

majority reside within currently unannotated genomic regions

that may harbor functional distal regulatory elements

(Figure 1A). Peaks at TSSs are significantly longer and more

intense than those at all other loci (Figure 1B). This observation

supports the view that regions around TSSs are generally more

susceptible to DNase I digestion than putative non-TSS regula-

tory elements (Boyle et al., 2008).

Approximately 48% (n = 48,777) of all DHS peaks overlap

phastCons vertebrate conserved elements (Siepel et al., 2005)

(Figure 1C). Notably, �87% (10,348/11,829) of peaks at TSSs

overlap phastCons elements, compared to �43% (38,429/

89,497) at non-TSS loci (Figure 1C). This difference remains

even after accounting for the longer peaks at TSSs (data not

shown), supporting the model that TSS-proximal regions evolve

under stronger sequence constraint than distal regulatory

elements (Boyle et al., 2008). A recent study developed an algo-

rithm (Chai) for topography-informed conservation analysis,

which identified�2-foldmore bases in the human genome under

evolutionary constraint compared to sequence-based methods

(Parker et al., 2009). Accordingly, �1.5 times as many (�76%)

islet DHS peaks overlap these structurally constrained regions

(Figure 1C).

To determine the extent of cell-type specificity of our islet DHS

peaks, we obtained DNase-seq data generated for four different

human cell lines: GM12878, K562, HeLa-S3, and HepG2 (Duke

DNase, ENCODE Project Consortium, 2007). We identified

DHS peaks for these cell lines (Experimental Procedures) and

found that roughly half the islet peaks are shared with each indi-

vidual nonislet cell type. Notably, �35% (n = 34,273) are

completely unique to the islet (Figure 1D). Almost all (�99%) of

these islet-unique peaks do not overlap RefSeq TSSs, which is

consistent with the model that tissue-specific gene expression

patterns are governed largely by distal cis-regulatory elements

(Heintzman et al., 2009).

An independent method to map open chromatin is formalde-

hyde-assisted isolation of regulatory elements (FAIRE) (Giresi

et al., 2007). Recently, this approach was used for human islets

to identify three sets of candidate peaks, including ‘‘stringent’’

(n = 9887) and ‘‘liberal’’ (n = 100,715) peaks (Gaulton et al.,

2010). Approximately 75% of the ‘‘stringent’’ islet FAIRE peaks

overlap DHS peaks. However, this corresponds to only 7360

peaks, which is far fewer than the predicted number of functional

regulatory elements genome-wide (ENCODE Project Consor-

tium, 2007). The overlap is significantly greater at TSSs com-

pared to non-TSSs (97% versus 65%) (Figure 1E). Comparing

DHS peaks to the set of ‘‘liberal’’ islet FAIRE peaks, the overlap

drops to �29%. Therefore, the two approaches seem to identify

distinct sets of non-TSS regulatory elements. Because it is diffi-

cult to assess the extent to which the dissimilarity between DHS

and FAIRE data is explained by differences in islet sample purity,

preparation methods, false positive signals, or population

(B) Average length (teal) and intensity (yellow) of DHS peaks across five genomic annotation sets. Peaks at RefSeq transcription start sites (TSSs) are significantly

longer and more intense than those elsewhere (**, two-tailed paired Student’s t test, p value < 10�100). Error bars represent SD (SD measurements were often

greater than the sample average due to highly skewed distributions, but error bars were cut off at zero for visualization).

(C) Sequence and structure constraint at DHS. DHS peaks at RefSeq TSSs are under substantially greater sequence constraint (assessed by phastCons verte-

brate conservation scores) than intronic and intergenic DHS peaks. A large majority of DHS peaks within all genomic annotation sets are under strong structural

constraint (assessed by the Chai algorithm) (Parker et al., 2009).

(D) Comparison of islet DHSpeakswith peaks from four different human cell lines. Each data point represents the fraction of total peaks (n = 101,326) unique to the

human islet relative to each of the other four human cell types or all of them combined (Union of all 4). Roughly 35% are unique to the islet, and 99% of these are

not located at RefSeq TSSs. Varying levels of similarity across cell types may be at least partially explained by differences in the stage of cellular differentiation

and/or sequencing depth.

(E) Overlap between DHS peaks and formaldehyde-assisted isolation of regulatory elements (FAIRE) peaks. The overlap is significantly greater at RefSeq TSSs

than elsewhere (**, Fisher’s exact test < 10�100).

(F) Logarithm-based distribution of the distance to the nearest distal DHS (d-DHS) peak among all d-DHS peaks. The blue box indicates an increased represen-

tation of peaks in the �100–1000 bp range (clustered) relative to Gaussian expectation (red curve). This range is significantly enriched for islet-unique peaks

(Fisher’s exact test, p = 2.7 3 10�9). Comparison of d-DHS, FAIRE, and GLITR locations is found in Figure S1.

113

diversity (McDaniell et al., 2010), more controlled comparisons of

these techniques will be necessary to elucidate inherent prefer-

ences of each for specific classes of open chromatin.

Though many of the mechanistic details are not clear, it is

widely accepted that distal and promoter regulatory elements

can exert coordinated control of gene transcription via physical

interactions (Dekker, 2003; Miele and Dekker, 2008). Therefore,

it has been hypothesized that distal cis-regulatory elements

may cluster together to form functional modules (Blanchette

et al., 2006). To assess the clustering of putative islet-active

distal cis-regulatory elements, we filtered from the islet DHS

peaks (n = 101,326) the regions that may represent promoters

to identify a set of high-confidence distal peaks (d-DHS, n =

34,039) (Table S2 and Figure S1 and Experimental Procedures).

For each d-DHS peak, we computed the distance to the nearest

d-DHS peak and observed an increased representation in the

�100–1000 bp range (n = 7652) relative to the expectation

from a normal distribution (Figure 1F). Furthermore, this set is

significantly enriched for islet-unique peaks (p = 2.7 3 10�9).

Genome-wide Characterization of TSSs in the IsletGenome via H3K4me3 ChIP-SeqTo characterize human islet TSSs, we conducted ChIP-seq anal-

ysis of histone 3 lysine 4 trimethylation (H3K4me3) in four

different human islet samples. H3K4me3 is enriched at CpG

islands (Bernstein et al., 2007), TSSs (Li et al., 2007), and sites

of active transcription (Kouzarides, 2007). Enriched regions

present in all four islet samples, but absent from three mock-IP

(anti-GFP) experiments, were designated as ‘‘H3K4me3 peaks.’’

This method identified 18,163 human islet H3K4me3 peaks

(Table S3) covering �1% of the genome.

As expected, approximately two-thirds (n = 11,973) of

H3K4me3 peaks overlap RefSeq TSSs (Figure 2A). Greater

than 70% of the remaining, unannotated peaks (n = 6190) over-

lap computationally predicted TSSs and/or CpG islands.

However, the significantly lower average length and intensity of

unannotated H3K4me3 peaks compared to those at RefSeq

TSSs (Figure 2B) suggests that at least some of these peaks

may indicate weakly active TSSs, inactive but poised TSSs

(Barski et al., 2007; Guenther et al., 2007; Mikkelsen et al.,

2007), remnants of transcriptional activity from the develop-

mental past or prior environmental stimulation (Barski et al.,

2009), or chromatin looping with distal regulatory regions. While

a subset of peaks could be false-positive signals, this is unlikely,

as it would require a technical artifact that is consistent across all

four islet samples.

Previous genome-wide profiling studies have reported a posi-

tive correlation between the intensity of H3K4me3 signal and

gene expression level (Barski et al., 2007; Guenther et al.,

2007). To test this observation in islets, we downloaded human

islet gene expression data from http://T1Dbase.org (Kutlu

et al., 2009), partitioned gene expression into quintiles, and

computed the average H3K4me3 signal length and intensity at

the TSSs of genes within each bin. Although the average

H3K4me3 peak length and intensity monotonically increases

with gene expression, there is great variability within each

expression bin (Figure 2C). Surprisingly, of the 245 most highly

islet-expressed genes in this data set, 18% (n = 45) have either

no or extremely low associated H3K4me3 signal. Notably, 71%

(32/45) also lacked a DHS peak (data not shown). Gene ontology

(GO) analysis revealed that these 45 genes are most significantly

enriched for themolecular function of hormone activity (p = 0.029

after Bonferroni correction for multiple testing) (Experimental

Procedures). These genes include insulin (INS), glucagon

(GCG), islet amyloid polypeptide (IAPP), pancreatic polypeptide

preprotein (PPY), somatostatin (SST), and transthyretin (TTR).

We confirmed by quantitative RT-PCR that INS, GCG, and SST

are robustly expressed (Figure S2), so it is unlikely that low

H3K4me3 at these TSSs is due to technical artifacts or adverse

effects of the islet shipment or handling process. Because these

genes are <10 kb in length, we considered the possibility that

weak H3K4me3 signal is simply associated with short genes.

However, the proportion of short genes (<10 kb in length) within

the set of ‘‘most highly expressed with no/low H3K4me3 signal’’

(66.7%, 30/45) is not statistically different from the proportion of

short genes within the entire set of most highly expressed

(69.8%, 171/245). This result suggests that the transcriptional

regulation of islet hormones and other related, highly islet-

expressed genes occurs through a distinct mechanism as

compared to most other genes.

H3K4me3 ChIP-chip (human embryonic stem cells, hepato-

cytes, REH cells [Guenther et al., 2007]) or ChIP-seq (human

CD4+ T cells [Barski et al., 2007] and GM12878, HUVEC,

NHEK, K562, and HeLa cell lines [Broad Institute ChIP-seq,

Bernstein lab, ENCODE Project Consortium, 2007]) data are

available for nine different human cell types. Comparisons

between islet and each other cell type indicated that, on average,

10%–30% of the islet peaks are unique (Figure 2D). Not surpris-

ingly, this value drops to�1.5% (n = 256) when comparedwith all

nine cell types together. Only 34 of the 256 islet-unique peaks

correspond to TSSs of annotated RefSeq genes, and these are

enriched for known pancreatic b cell functions such as secretion

(p = 9.33 10�3) and Ca2+-dependent exocytosis (p = 6.63 10�3)

(Table 1). Furthermore, several of the genes (SLC30A8, GCK)

harbor genetic variants that confer significant risk for T2D and

elevated plasma fasting glucose levels (Dupuis et al., 2010;

Ingelsson et al., 2010; Prokopenko et al., 2008, 2009). The

remaining 222 islet-unique peaks may represent alternative

TSSs of genes with function in developing and/or mature islets

or TSSs of unannotated coding or noncoding transcription units.

Identification of Unannotated Islet-Active TSSsH3K4me3 peaks in unannotated genomic space (n = 6190) are

TSS candidates. Because H3K4me3 may also be enriched at

inactive TSSs (Guenther et al., 2007), we adopted a two-step

approach to identify the subset of these 6190 peaks that are

likely to be active in the human islet (Figure S3A). First, we devel-

oped an algorithm that uses DHS peaks to assign directionality

to H3K4me3 peaks (Experimental Procedures). DHS peaks

tend to be sharply focused around the TSS, while H3K4me3

peaks are broader and extend well into the body of the transcrip-

tion unit. We hypothesized that the location of the DHS peak

relative to the H3K4me3 peak could predict the directionality

of the underlying gene. Using the strongest DHS peak within

an H3K4me3 peak, this simple algorithm performed at �90%

accuracy on annotated RefSeq genes known to be expressed

in the human islet (Experimental Procedures). Interestingly, the

majority (�80%) of the incorrectly assigned TSSs (based on

114

current annotation) harbored multiple DHS peaks, positioned on

either end of the H3K4me3 peak. These H3K4me3 peaks are

slightly (�200 nt) longer than those for which the orientation

was correctly assigned, increasing the likelihood of overlapping

non-TSS-related DHS peaks, which can confound the prediction

algorithm. Many of these non-TSS DHS peaks may correspond

to CTCF-binding sites that are located on the opposite side of

the DHS with respect to the TSS (Boyle et al., 2008) and RNA

polymerase (Pol) III-bound loci found in chromatin domains

occupied by Pol II and associated with enhancer-binding factors

(Oler et al., 2010). We observe examples of each case in our data

set (Figure S4).

Second, we performed ChIP-seq to profile genome-wide

histone 3 lysine 79 dimethylation (H3K79me2), which is enriched

in actively transcribed regions (Guenther et al., 2007). If the rela-

tive density of H3K79me2 reads on either side of an H3K4me3

peak was consistent with its predicted directionality (as deter-

mined from the pattern of the DHS and H3K4me3 signal), then

Computationally predicted TSSsand/or CpG islands

Potential Novel TSSs

26%

74%

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0

1000

2000

3000

4000

0-20 20-40 40-60 60-80 80-1000

100

200

300

350

0

1000

2000

3000

3500

All TSSPromoter

Intergenic

Exonic

Intronic

0

100

200

300


Aver

age

peak

leng

th (n

t)

Aver

age

peak

inte

nsity

(-log

[p v

alue

])

Aver

age

peak

leng

th (n

t)

Aver

age

peak

inte

nsity

(-log

[p v

alue

])

Percentile bins of expressionin the human pancreatic islet

**

GM12878

Hepatocytes

ESCCD4+ T

REHK562

HUVEC

NHEKHeLa

Union all 9

256Frac

tion

of p

eaks

uni

que

to th

e is

let

H3K4me3 peaks (n = 18,163)A

B

C

D

15%

6%

66%

7%

6%

Figure 2. Analysis of Histone 3 Lysine 4 Trimethylation Loci in the Islet Genome

(A) Distribution of H3K4me3 peaks across five genomic annotation sets as described in Figure 1A. Two-thirds of the peaks span RefSeq transcription start sites

(TSSs, left pie chart). Non-RefSeq H3K4me3 peaks are enriched for computationally predicted TSS and/or CpG islands (right pie chart). Additional information is

provided in Figure S3.

(B) Average length (purple) and intensity (blue) of H3K4me3 peaks across five genomic annotation sets as described in Figure 1B. The average length and intensity

of peaks is significantly higher at TSSs (**, two-tailed paired Student’s t test, p value < 10�100). Error bars represent SD.

(C) Relationship between average H3K4me3 peak length (yellow)/intensity (purple) and average gene expression level. Error bars represent SD.

(D) Comparison of islet H3K4me3 peaks with peaks from nine different human cell types. Each data point represents the fraction of total peaks (n = 18,163) unique

to the human islet relative to each of the other nine human cell types or all of them combined (Union of all 9). �1.5% of the peaks are unique to the islet. Varying

levels of similarity across cell types may be at least partially explained by differences in the stage of cellular differentiation and/or sequencing depth.

115

the underlying TSSwas classified as islet active. Intragenic TSSs

are difficult to assess using this method, because the H3K79me2

signal may be due to transcription from an upstream TSS.

Restricting the analysis to intergenic space, we identified 263

candidates for unannotated, islet-active TSSs (Table S4), of

which 75% (n = 196) overlapCpG islands and/or computationally

predicted TSSs (Figure S3A). These candidates include islet-

active TSSs for noncoding RNAs such as the let-7a-1 cluster

of microRNAs (Figure 3A) and the miR-1179/miR-7-2 cluster

(Figure S3B). We also identified putative alternative TSSs for

genes with important islet function such as pancreatic peptidyl-

glycine a-amidating mono-oxygenase (PAM), which encodes for

an islet secretory granule membrane protein (Figure 3B). Finally,

we identified an active promoter locus that is contained within

a recently reported T1D-associated region on chromosome 12

(index SNP rs1701704). This promoter could underlie an unanno-

tated transcript or could be an alternative promoter for the down-

stream gene Ikaros family zinc finger 4 (IKZF4) (Figure S3C),

which is considered a strong functional candidate for T1D (Hako-

narson et al., 2008).

Identification of Distal cis-Regulatory ElementsSites bound by the CTCF are an important class of cis-regulatory

elements that can mediate insulator or other regulatory activities

(Phillips and Corces, 2009). To generate a genome-wide CTCF-

binding site profile in the human islet, we performed ChIP-seq

and designated enriched regions as ‘‘CTCF peaks’’ (n =

21,304) (Table S5 and Experimental Procedures). We assessed

the genomic distribution of peaks (Figure 4A), computed the

average peak intensity/length across various genomic cate-

gories (Figure 4B), and identified the most significantly overrep-

resented motif within the peaks using MEME (Figure 4C and

Supplemental Experimental Procedures). The results corrobo-

rate those from previously described studies in other cell types

(Kim et al., 2007; Jothi et al., 2008; Cuddapah et al., 2009).

Further, only 0.6% (n = 123) of CTCF peaks were islet unique

(Figure 4D). Finally, we observed that among the 77% of CTCF

peaks that overlap 22% of DHS peaks, the CTCF peaks are

positioned near the center of the DHS peak with a slight 50 shift(Figure 4E).

Previous studies have observed depletion of monomethylated

histone 3 lysine 4 (H3K4me1) at TSSs and enrichment at putative

enhancers such as distal STAT1 and EP300 sites (ENCODE

Project Consortium, 2007; Heintzman et al., 2007, 2009; Robert-

son et al., 2008) and nonpromoter DHS (Barski et al., 2007;

Robertson et al., 2008; Wang et al., 2008). To profile H3K4me1

across the human islet genome, we repeated the ChIP-seq

strategy described above for three islet samples. We computed

the average ratio of the density of extended H3K4me1 sequence

reads in DHS peaks at RefSeq TSSs (t-DHS, n = 11,829) and

d-DHS peaks (n = 34,039) (Experimental Procedures) to the

density in flanking control regions that do not harbor DHS signal

(Experimental Procedures). t-DHS peaks are significantly

depleted for H3K4me1, whereas d-DHS peaks are significantly

enriched (Figure 5). Further, there was no significant difference

in H3K4me1 enrichment between CTCF-positive and CTCF-

negative d-DHS. Although we detected depletion of H3K4me1

at t-FAIRE peaks, there was no enrichment at d-FAIRE peaks

(Figure 5).

We did not detect dramatically different H3K4me1 enrichment

levels between intergenic and intragenic d-DHS peaks (Fig-

ure S5). Interestingly, although the average H3K4me3 read

density in d-DHS peaks was �3-fold less than that of

H3K4me1, d-DHS peaks were still enriched for H3K4me3 signal

relative to flanking control regions (Figure S5). These observa-

tions are consistent with the previous finding that although

H3K4me1 often marks distal regulatory regions, a substantial

portion is also associated with H3K4me3 signal (Robertson

et al., 2008). Overall, the enrichment of active histone modifica-

tions suggests that islet d-DHS peaks are strong candidates for

putative regulatory elements. Fifty published index SNPs (http://

www.genome.gov/gwastudies/) and their linkage disequilibrium

partners (r2 > 0.6) for diabetes (T1D, T2D) and related quantita-

tive traits (fasting glucose, fasting insulin) are found within

500 bp of nonpromoter d-DHS peaks (Table S9 and Experi-

mental Procedures), suggesting that these SNPs may contribute

to diabetes or altered islet physiology by modulating regulatory

element activity.

Application of Chromatin Profiles to T2DSusceptibility LociTo identify regulatory elements and transcripts that may underlie

molecular mechanisms of T2D, we analyzed the chromatin

profiles in the 18 GWAS-derived genomic loci conferring risk

for T2D (Prokopenko et al., 2008). The genomic boundaries

of each association signal (Table S6) were defined by the Spotter

algorithm (Experimental Procedures). The chromatin profiles do

not predict any alternative promoters or unannotated/noncoding

Table 1. Examples of Islet-Unique H3K4me3 Peaks

Gene Symbol Relevance to Islet Biology

GCK Involved in glucose metabolism

T2D GWAS locus (Dupuis et al., 2010)

Harbors an islet-specific promoter (Magnuson, 1990)

SLC30A8 Involved in cation (Zn+) transport important for insulin

secretion (Chimienti et al., 2004)

T2D GWAS locus (Prokopenko et al., 2008)

Exhibits islet-specific expression (Chimienti et al., 2004)

REG1A Derived from regenerating islets (Terazono et al., 1988)

FFAR1 Exhibits islet-specific expression (Bartoov-Shifman

et al., 2007)

Regulates insulin secretion (Itoh et al., 2003)

SYT4 Involved in Ca2+-dependent trafficking and exocytosis

of secretory vesicles (Tsuboi and Rutter, 2003)

KCNK16 Exhibits pancreas-specific expression

(Girard et al., 2001)

ELAVL4 Regulates cell proliferation (Joseph et al., 1998)

UCN3 Regulates glucose-stimulated insulin secretion

(Li et al., 2007)

PRSS1 Harbors mutations that underlie hereditary pancreatitis

and pancreatic cancer (Teich et al., 1998)

Nine examples among the 34 islet-unique peaks that are at RefSeq tran-

scription start sites (TSSs). The corresponding genes have known

pancreatic islet function (such as insulin secretion), and some harbor

genetic variants that confer significant risk for type 2 diabetes

(SLC30A8 and GCK).

116

transcripts in these regions. However, they do identify 118

d-DHS peaks, which represent putative distal regulatory ele-

ments (Table S7 and Experimental Procedures). About one-

quarter of these elements (n = 28) are bound by CTCF in the islet.

Six of the 118 elements contain one or more T2D-associated

SNPs (index SNP or SNP with r2 > 0.6) (Table S8). These six

include a previously identified element containing the index

SNP rs7903146 in the TCF7L2 locus (Gaulton et al., 2010). The

remaining five map to the IGF2BP2, KCNQ1, WFS1, FTO, and

CDC123/CAMK1D loci. Only the CDC123/CAMK1D element is

bound by CTCF in the islet.

Validation of Putative Islet Regulatory Elementsin T2D LociTo determine whether predicted regulatory elements in the islet

can function as enhancers, we cloned two classes of elements

Scalechr9:

20 kb95970000 95975000 95980000 95985000 95990000 95995000 96000000 96005000 96010000

hsa-let-7a-1hsa-let-7f-1

hsa-let-7d

DHS

140 -

2 _

H3K4me1

18 -

3 _

H3K4me3

137 -

3 _

H3K79me2

DHS

H3K4me1

H3K4me3

H3K79me2

29 -

3 _

Scalechr5:

Eponine TSSSwitchGear TSS

Mammal Cons

RhesusMouse

DogHorse

ArmadilloOpossumPlatypus

LizardChicken

X_tropicalisStickleback

100 kb102150000 102200000 102250000 102300000 102350000

PAMPAMPAMPAMPAMPAM

84 -

2 _19 -

3 _135 -

3 _23 -

3 _Duke Uniq 20Duke Uniq 24Duke Uniq 35

Umass Uniq 15

3

3

Eponine TSSSwitchGear TSS

Mammal ConsRhesusMouse

DogHorse

ArmadilloOpossumPlatypus

LizardChicken

X_tropicalisStickleback

Duke Uniq 20Duke Uniq 24Duke Uniq 35

Umass Uniq 15let-7 miRNA clusterBG326593

BI459078BG724094

Pri-let-7promoter

Un-annotatedislet-activepromoter

Un-annotatedislet-uniquepromoter

Annotatedislet-activepromoter

B

A

Figure 3. Identifying Unannotated Islet-Active Transcription Start Sites

(A) Candidate islet-active TSS for the primary transcript of the ubiquitous let-7a-1/7d/7f-1microRNA cluster. The TSS (red box; DHS+, H3K4me3+, H3K4me1�) is

�10 kb upstream of the 50-most microRNA in the cluster, and the full-length primary transcript (H3K79me2+) of�35 kbmatches a known EST (BSG326593). This

EST likely represents a noncoding RNA primary transcript from which the let-7 cluster of miRNAs is processed (Marson et al., 2008). The strategy for predicting

TSSs is shown in Figure S3A.

(B) Two candidate islet-active alternative TSSs (red boxes) for the gene PAM, which encodes an islet secretory granule membrane protein. One of the candidate

TSSs is also islet unique and occurs between the annotated TSS and an unannotated islet-active TSS. Examples of confounding factors for predicting islet-active

TSSs are shown in Figure S4.

117

containing d-DHS peaks into luciferase reporter vectors

(Figure 6): those bound by CTCF (‘‘C,’’ n = 11) and those that

are not (‘‘P,’’ n = 33). We also cloned a number of non-DHS,

non-CTCF controls (‘‘N,’’ n = 15). Because human islet cell lines

are not available, we tested these elements for enhancer activity

in murine pancreatic MIN6 (Figure 6A) and HeLa (Figure 6B) cell

lines. Only �15% (4/26) of the negative controls exhibited

enhancer activity in any orientation or cell type (�9% [1/11] of

0

200

400

600

800

1000

All TSSPromoter

Intergenic

Exonic

Intronic

0

20

40

60

80

90

0

.05

.1

.15

.2

.25

.3

.35

-1.0

to -0

.9-0

.9 to

-0.8

-0.8

to -0

.7-0

.7 to

-0.6

-0.6

to -0

.5-0

.5 to

-0.4

-0.4

to -0

.3-0

.3 to

-0.2

-0.2

to -0

.1-0

.1 to

00

to 0

.10.

1 to

0.2

0.2

to 0

.30.

3 to

0.4

0.4

to 0

.50.

5 to

0.6

0.6

to 0

.70.

7 to

0.8

0.8

to 0

.9

Aver

age

peak

leng

th (n

t)

Aver

age

peak

inte

nsity

(-lo

g[ [p

val

ue])

Frac

tion

of p

eaks

uni

que

to is

let

Frac

tion

of C

TCF

peak

s

Position relative to DHS peak (kb)

6%9%

46%

30%

9%

123

GM12878

K562HUVEC

NHEKUnion all 5

CD4+ T

CTCF peaks (n = 21,304)

0

0.05

0.1

0.15

0.2

0.25

10

1

2

2 3 4 5 6 7 8 9 10 11 12 13 14

bits


A B

C

E

D

Figure 4. Profiling of Binding Sites for the CCCTC-Binding Factor

(A) Distribution of CTCF peaks across five genomic annotation sets as described in Figure 1A.

(B) Average length (orange) and intensity (green) of CTCF peaks across five genomic annotation sets is fairly uniform. Error bars represent SD.

(C) Motif determined by MEME (Bailey and Elkan, 1994) using the top 10% of CTCF peaks.

(D) Comparison of islet CTCF peaks with peaks from five different cell types. Each data point represents the fraction of total peaks (n = 21,304) unique to the

human islet relative to each of the other five human cell types or all of them combined (Union of all 5). Less than 1% of the peaks are unique to the islet (n =

123). Varying levels of similarity across cell typesmay be at least partially explained by differences in the stage of cellular differentiation and/or sequencing depth.

(E) Positioning of CTCF peaks relative to the center of overlapping DHS peaks (red line). Almost all CTCF peaks that overlap DHS peaks are within 200 bp of the

DHS peak center.

118

‘‘C’’ elements and 20% [3/15] of ‘‘N’’ elements) (Figures 6A and

6B). In contrast, �2.5-fold more ’’P’’ elements demonstrated

enhancer activity (12/33). This positive rate (36.4%) is compa-

rable to that of predicted HeLa enhancers (Heintzman et al.,

2009) that exhibited increased luciferase activity in our HeLa

reporter assays (38.5%, 5/13).

Four of 12 ‘‘P’’ elements exhibiting enhancer activity (P4,

KCNJ11/ABCC8; P12, TCF7L2; P17, WFS1; P20, HHEX/IDE)

are unique to the islet; one of these (P17, WFS1) is also unde-

tected by at least three other methods for the prediction of

regulatory element potential: PReMod (Ferretti et al., 2007),

phastCons (Siepel et al., 2005), and islet-FAIRE (Gaulton

et al., 2010). The average H3K4me1 enrichment among the 12

d-DHS peaks in the elements exhibiting enhancer activity was

similar to that computed for all d-DHS (�1.3-fold) (Figure 6C).

However, there was large variation in H3K4me1 enrichment

among individual elements (0.6- to 3.4-fold), with only 3/12

enriched above baseline (1.0) (Figure 6C).

Allele-Specific Analysis of Five Regulatory ElementsContaining T2D-Associated SNPsFive ‘‘P’’ elements tested contain T2D-associated SNPs (P9,

IGF2BP2; P12, TCF7L2; P17, WFS1; P21, KCNQ1; P23, FTO)

(Figures 6A and 6B). Notably, four out of the five elements (all

except P9) exhibited enhancer activity in at least one orientation

and cell type tested. To assess allele- or haplotype-specific

effect(s) of T2D-associated variants on enhancer activity, we

cloned these four regions from the genomic DNA of individuals

with risk and nonrisk genotypes/haplotypes and compared lucif-

erase reporter activity (Figures 6D and S6A). We confirmed

significantly stronger enhancer activity for the TCF7L2 element

(P12) containing the rs7903146 risk allele relative to the nonrisk

allele (�3-fold) (Figure 6D) (Gaulton et al., 2010). TCF7L2 allelic

enhancer effects were specific to the MIN6 cell line (Figure 6D,

compare MIN6 and HeLa). Sequencing of the TCF7L2 inserts

from each haplotype revealed two variant bases, a novel variant

(C/G at Chr10:114,747,977; hg18) and rs7903146; only

rs7903146 mediated allele-specific effects on enhancer activity

(Figure 6D, compare Risk to Nonrisk and Nonrisk(m)) (Fig-

ure S6B). We also identified a haplotypic effect on enhancer

activity for the WFS1 element (P17), which contains four SNPs

(rs4689397, rs6823148, rs881796, and rs4234731). The risk

haplotype exhibited �30% lower activity than nonrisk in HeLa

cells (Figure 6D).

DISCUSSION

In this study, we describe themost comprehensive characteriza-

tion to date of the epigenomic profile of unstimulated human

pancreatic islets. Using DNase- and ChIP-seq techniques, we

profiled open chromatin, CTCF-binding sites, H3K4me3,

H3K4me1, and H3K79me2 across the entire genome in human

islets. Integrated analysis of these large-scale data sets identi-

fied �18,000 putative TSSs, �30% of which were previously

unannotated by RefSeq. Further computational genomic anal-

yses revealed that at least several hundred of these are

islet-active TSSs, including those for major islet miRNAs previ-

ously implicated in the control of glucose homeostasis (Lynn,

2009). Interestingly, active chromatin marks (H3K4me3, DHS,

H3K79me2) were absent from a subset of highly islet-expressed

genes, including those encoding islet-specific hormones (INS,

GCG, SST, IAPP, PPY, and TTR). This observation suggests

that some genes critical for islet function have an unconventional

promoter chromatin signature, indicative of a unique transcrip-

tional control mechanism. Mutskov and Felsenfeld (2009) have

proposed such a model based on detailed analysis of the INS

locus in human islets.

We also identified �34,000 candidate distal regulatory

elements in human islets. A substantial number of these putative

elements were clustered (<1000 bp from each other). Compari-

sons with other cell types indicated that these clustered

elements are significantly enriched for islet-unique sites and

thus may represent islet-specific regulatory modules worthy of

more extensive future investigation. Based on CTCF-binding

profiles, �22% of the �34,000 candidate distal regulatory

elements are predicted insulator sites. Previous studies have

reported that the H3K4me1 signal is enriched in distal regulatory

elements (Heintzman et al., 2007, 2009). Though our analyses

confirm this finding in aggregate, we show that H3K4me1 enrich-

ment may not be a reliable predictor of regulatory activity for

individual elements.

Fifty SNPs associated with islet-related diseases and traits

map to within 500 bp of a candidate nonpromoter regulatory

element. Focusing on T2D, 4 of the 12 elements that function

as enhancers in vitro (FTO, KCNQ1, TCF7L2, and WFS1 loci)

harbor T2D-associated SNPs, including two (TCF7L2 and

WFS1 loci) that exhibit significant allele-specific differences in

activity. These results suggest that altered enhancer activity

plays a role in the molecular mechanism underlying at least

a subset of T2D genetic association signals.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

t-FAIRE t-DHS d-FAIRE d-DHS non-CTCFd-DHS

** **

**

n.s.

H3K

4me1

fold

enr

ichm

ent

Figure 5. Representation Analysis of Histone H3 Lysine 4 Monome-

thylation in Candidate Regulatory Regions

DNase I-hypersensitive site (DHS) and formaldehyde-assisted isolation of

regulatory elements (FAIRE) peaks at RefSeq TSSs (t-DHS and t-FAIRE,

respectively) are significantly depleted for H3K4me1 signal (**, two-tailed

paired Student’s t test, p < 0.005), and DHS peaks at distal candidate regula-

tory elements (d-DHS) are enriched for H3K4me1 signal (*, two-tailed paired

Student’s t test, p < 0.01). Error bars represent SD among three islet samples.

FAIRE data were obtained from Gaulton et al. (2010). Representation analysis

of additional histone modifications is shown in Figure S5.

119

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

A

B

0 5 10 15 20

Forward

ReverseNon-risk (m)

Non-risk

Risk

0 1 2 3 4 5

Forward

Reverse

Non-risk

Risk

Relative Luciferase activity (a.u.)

Relative luciferase activity (a.u.)

0 0.2 0.4 0.6 0.8 1

Forward

Reverse

Non-risk (m)

Non-risk

Risk

0 1 2 3 4 5

Forward

Reverse

Non-risk

Risk

Relative Luciferase activity (a.u.)

Relative luciferase activity (a.u.)

MIN6 HeLa

TC

F7L

2 (P

12)

WF

S1 (P

17)

**

**

30%**

1 5 10 11 1298743 62 1 5 10 11 1298743 62 1 5 10 11 #12

# 98743 62 2018161513 14 #17 19 22#21

#231513 14 24 25 26 27 28 29 30 31 32 33

dDHS+/CTCF+ ("C") dDHS- ("N") dDHS+/CTCF- ("P")

1 5 10 11 1298743 62 1 5 10 11 1298743 62 1 5 10 11 #12

# 98743 62 2018161513 14 #17 19 22#21

#231513 14 24 25 26 27 28 29 30 31 32 33

dDHS+/CTCF+ ("C") dDHS- ("N") dDHS+/CTCF- ("P")

9.2/9.5 10.6/21.8 9.6 12.3

29.9 21.8 11.7 10.6/18.5

6

2

4

3

1

5

0

H3K

4me1

fold

enr

ichm

ent

Elements exhibiting enhancer activity

C D

Average

Baseline

Rel

ativ

e lu

cife

rase

act

ivity

(a.u

.)R

elat

ive

luci

fera

se a

ctiv

ity (a

.u.)

**

** ****

HeLa forward

HeLa reverse

P32P21

P31P23

P4 P12P17

P15P20

P26P8 P27

MIN6 forward

MIN6 reverse

120

These data sets should provide functional context for noncod-

ing variants identified through additional association, targeted

resequencing, or whole-genome sequencing studies. Further

analysis of the repertoire of regulatory elements in the human

islet will enhance the understanding of gene regulation in the islet

and should offer additional insight into the molecular mecha-

nisms that underlie diabetes susceptibility.

EXPERIMENTAL PROCEDURES

Human Islets

Fresh human pancreatic islets were obtained from the ICR Basic Science Islet

Distribution Program and National Disease Research Interchange (NDRI). Islet

viability and purity were assessed by the distribution centers and are shown

along with phenotypic/clinical information of each donor in Table S10. Islets

were warmed to 37�C andwashedwith calcium- andmagnesium-free Dulbec-

co’s phosphate-buffered saline (Invitrogen; Carlsbad, CA) prior to crosslinking.

For chromatin immunoprecipitation (ChIP) studies, cells were crosslinked for

20 min in 1% formaldehyde at room temperature, frozen in liquid nitrogen,

and stored at �80�C.

DNase-Seq and DHS Peak Identification

For DNase-seq experiments, fresh pancreatic islets were disaggregated to

achieve single-cell suspension. Islets were washed with prewarmed 1X PBS

once and resuspended with dissociation solution (1 ml of 1X PBS, 50 ml of

0.05 U/ml Dispase I stock solution [Roche; Indianapolis, IN]). Islet suspension

was transferred to a 6-well culture dish, incubated at 37�C for 30 min, dissoci-

ated with a 2 ml sterile pipette, and incubated for another 30 min. This incuba-

tion-agitation cycle was repeated 4 or 5 times until >90% of islets were disag-

gregated into single cells. Cells were washed with prewarmed 1X PBS once

and prepared for DNase-seq experiments as previously described (Song

and Crawford, 2010). Libraries from three primary human islet samples

(Table S10) were sequenced using the Illumina GAII platform. Peaks were

identified using MACS (Supplemental Experimental Procedures) (Zhang

et al., 2008).

ChIP and Illumina GAII Sequencing

ChIP assays were carried out as previously described (Scacheri et al.,

2006), with the following modifications. Intact nuclei were isolated and

chromatin was sheared on ice using a Branson 450 Sonifier (constant duty

cycle, output 4, 12–16 cycles of 20 s sonicationwith 1min rest between cycles)

to a size of 200–1000 bp. Antibodies used for ChIP were anti-H3K4me3

(ab8580, Abcam; Cambridge, MA), anti-H3K4me1 (ab8895, Abcam), anti-

H3K79me2 (ab3594, Abcam), anti-CTCF (ab70303, Abcam; 07-729, Millipore;

Danvers, MA), and anti-GFP (sc-8334, Santa Cruz Biotechnology; Santa

Cruz, CA).

Islet ChIP-seq libraries were prepared and sequenced using the Illumina

GAII protocol and platform. The number of sequencing lanes, clusters, aligned

reads, repeat-filtered reads (no satellite reads), and unique starts is shown

for each islet and ChIP experiment in Table S12. MACS (Zhang et al., 2008)

was used to call H3K4me3 and CTCF peaks (Supplemental Experimental

Procedures).

Genome-wide Analysis of Chromatin Marks

Perl and R scripts were written to perform the genomic characterization and

comparative analysis of DHS, H3K4me3, and CTCF peaks. Unless otherwise

noted, functional annotation data sets (including RefSeq and UCSC known

genes, predicted TSSs and bidirectional promoters, phastCons elements,

CpG islands, and ChIP-seq data sets) were downloaded from the UCSC Table

Browser on November 1, 2009 (http://genome.ucsc.edu/cgi-bin/hgTables).

For ‘‘computationally predicted TSSs,’’ both the Eponine and the Switch-

gear data sets from the UCSC Table Browser were utilized. Human pancreatic

islet gene expression data were downloaded from T1DBase (http://T1Dbase.

org), and expression data for other tissues were downloaded from BioGPS

Human U133A/GNF1H Gene Atlas (http://biogps.gnf.org/downloads/). Islet-

selective gene expression was defined as at least 3-fold greater expression

in the islet relative to any other tissue represented. Genome-wide results of

the Chai algorithm were determined according to the parameters in Parker

et al. (2009), and islet-FAIRE data sets were obtained from Gaulton et al.

(2010). GO analyses were performed using the web-based tool NIH DAVID

6.7 (http://david.abcc.ncifcrf.gov/). For the DHS peak clustering analysis

(Figure 1F) and the histone modification enrichment/depletion analysis

(Figures 5 and S5), we stringently defined d-DHS peaks as those that are

not within H3K4me3 peaks andR5 kb away from RefSeq TSSs, UCSC Known

Gene TSSs, Eponine or Switchgear computationally predicted TSSs, and CpG

islands, yielding 34,039 d-DHS. To select regulatory elements to test for

enhancer activity (Figure 6), the definition of d-DHS was slightly loosened

(R5 kb upstream and R1 kb downstream from known and predicted TSSs

and CpG islands). P values for statistical comparisons were computed using

either the two-tailed paired Student’s t test or the Fisher’s exact test. Details

of the remaining computational analyses are described in Supplemental

Experimental Procedures.

Molecular Cloning

Putative regulatory elements were amplified from human genomic DNA with

primers designed using PrimerTile (http://research.nhgri.nih.gov/tools/).

Element boundaries were determined by manual H3K4me1 profile inspection.

Coordinates of amplified elements and primer sequences for amplification are

found in Table S13. Putative regulatory elements were cloned using the

Gateway system (Invitrogen). Generation of Gateway-compatible vectors is

described in Supplemental Experimental Procedures. Variants of interest

were introduced using QuikChange Lightning (Stratagene; La Jolla, CA). Muta-

genesis primer sequences are available upon request. Mutagenesis was

confirmed by direct sequencing.

Transfection and Dual Luciferase Assays

Cells were seeded in 96-well plates (40,000 cells/well HeLa, 60,000 cells/well

MIN6) and cotransfected with 0.072 pmol Gateway-modified firefly (pGL 4.23,

Promega; Madison, WI) and 2 ng Renilla (pRL-TK, Promega) vectors using

Lipofectamine 2000 (Invitrogen). Two vector preparations per insert orientation

were tested. Transfections were performed in triplicate.

Cells were lysed in 13 passive lysis buffer (Promega) 36–48 hr posttransfec-

tion, and dual luciferase assays were run on a Centro/Centro XS3 Microplate

Luminometer LB 960 (Berthold; Bad Wildbad, Germany). Firefly values were

normalized to Renilla to control for differences in cell number or transfection

efficiency. Luciferase assays were performed in triplicate. For each element

tested, at least two independent vector preparations were used. Activity was

Figure 6. Luciferase Reporter Activity Validates Putative Enhancer Elements

(A) Relative luciferase activity of constructs in three element classes tested in MIN6 cells. Genomic locations of elements are found in Table S13. Blue and orange

dashed lines indicate 2.33 standard deviations (p = 0.01) (Heintzman et al., 2009) above the median activity of tested CTCF-bound regions for elements cloned in

the forward or reverse orientations, respectively. Data represent the mean ±SD of three replicates each for two separate clones (six total measurements). C,

d-DHS+/CTCF+ element; N, d-DHS�/CTCF�; P, d-DHS+/CTCF� element. # marks elements containing T2D-associated SNPs. Numbers above the bars indi-

cate the luciferase activity for elements beyond the scale of the y axis; a.u. denotes arbitrary units.

(B) Relative luciferase activity of constructs in three element classes tested in HeLa cells. Data are analyzed and annotated as in (A).

(C) H3K4me1 representation in the 12 elements exhibiting enhancer activity. Though the overall average enrichment of H3K4me1 is �1.3-fold (green line), only

3/12 elements are above baseline (red line). Error bars represent SD among three islet samples.

(D) Relative luciferase activity of TCF7L2 (P12) andWFS1 (P17) elements in MIN6 (left panels) or HeLa (right panels) cells containing the risk or nonrisk alleles of

T2D-associated SNPs. For TCF7L2, (m) denotes a mutation generated by site-directed mutagenesis from the risk to nonrisk allele. Data represent the mean ±SD

of three replicates each from at least two independent clones. **, two-tailed unpaired Student’s t test, p < 0.01. Additional allelic analysis is shown in Figure S6.

121

defined as 2.33 standard deviations (SD) (p = 0.01) above themedian activity of

negative controls (Heintzman et al., 2009), defined as CTCF-bound elements in

this study.

ACCESSION NUMBERS

The NCBI Gene Expression Omnibus (GEO) umbrella accession number,

which links to the individual ChIP-seq and DNase-seq data sets, is GSE23784.

SUPPLEMENTAL INFORMATION

Supplemental Information includes Supplemental Experimental Procedures,

six figures, and 13 tables and can be found with this article online at doi:

10.1016/j.cmet.2010.09.012.

ACKNOWLEDGMENTS

Human pancreatic islets used in this study were obtained through the ICR

Basic Science Islet Distribution Program (University of Minnesota, University

of Alabama-Birmingham, University of Illinois, University of Miami, North-

western University) and the National Disease Research Interchange (NDRI).

We thank Fangfei Ye and Lisa Bukovnik at the Duke IGSP Sequencing Core

Facility for sequencing DNase libraries, the DIAGRAM Consortium for helpful

discussion regarding variants in the KCNQ1 locus, andmembers of the Collins

and Boehnke labs for insightful discussions during the study and critical

comments on the manuscript. Special thanks to Cristen Willer and Greg Keele

for help with statistical analyses of ChIP/GWAS data. This study was sup-

ported by the NIH Division of Intramural Research/NHGRI project number

Z01-HG000024 (F.S.C.), by NIH grant DK062370 (M.B.), and by an NIH/NHGRI

ENCODE Consortium grant (U54HG004563 to G.E.C. and T.S.F.).

Received: May 7, 2010

Revised: July 22, 2010

Accepted: August 26, 2010

Published: November 2, 2010

REFERENCES

Bailey, T.L., and Elkan, C. (1994). Fitting a mixture model by expectation maxi-

mization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol.

Biol. 2, 28–36.

Barrett, J.C., Clayton, D.G., Concannon, P., Akolkar, B., Cooper, J.D., Erlich,

H.A., Julier, C., Morahan, G., Nerup, J., Nierras, C., et al. (2009). Genome-

wide association study and meta-analysis find that over 40 loci affect risk of

type 1 diabetes. Nat. Genet. 41, 703–707.

Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G.,

Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methyl-

ations in the human genome. Cell 129, 823–837.

Barski, A., Jothi, R., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., and

Zhao, K. (2009). Chromatin poises miRNA- and protein-coding genes for

expression. Genome Res. 19, 1742–1751.

Bartoov-Shifman, R., Ridner, G., Bahar, K., Rubins, N., and Walker, M.D.

(2007). Regulation of the gene encoding GPR40, a fatty acid receptor ex-

pressed selectively in pancreatic beta cells. J. Biol. Chem. 282, 23561–23571.

Bernstein, B.E., Meissner, A., and Lander, E.S. (2007). The mammalian epige-

nome. Cell 128, 669–681.

Bhandare, R., Schug, J., Le Lay, J., Fox, A., Smirnova, O., Liu, C., Naji, A., and

Kaestner, K.H. (2010). Genome-wide analysis of histone modifications in

human pancreatic islets. Genome Res. 20, 428–433.

Blanchette, M., Bataille, A.R., Chen, X., Poitras, C., Laganiere, J., Lefebvre, C.,

Deblois, G., Giguere, V., Ferretti, V., Bergeron, D., et al. (2006). Genome-wide

computational prediction of transcriptional regulatory modules reveals new

insights into human gene expression. Genome Res. 16, 656–668.

Boyle, A.P., Davis, S., Shulha, H.P., Meltzer, P., Margulies, E.H., Weng, Z.,

Furey, T.S., and Crawford, G.E. (2008). High-resolution mapping and charac-

terization of open chromatin across the genome. Cell 132, 311–322.

Brink, C. (2003). Promoter elements in endocrine pancreas development and

hormone regulation. Cell. Mol. Life Sci. 60, 1033–1048.

Butler, P.C., Meier, J.J., Butler, A.E., and Bhushan, A. (2007). The replication of

beta cells in normal physiology, in disease and for therapy. Nat. Clin. Pract.

Endocrinol. Metab. 3, 758–768.

Chimienti, F., Devergnas, S., Favier, A., and Seve, M. (2004). Identification and

cloning of a beta-cell-specific zinc transporter, ZnT-8, localized into insulin

secretory granules. Diabetes 53, 2330–2337.

Crawford, G.E., Holt, I.E., Mullikin, J.C., Tai, D., Blakesley, R., Bouffard, G.,

Young, A., Masiello, C., Green, E.D., Wolfsberg, T.G., et al. (2004). Identifying

gene regulatory elements by genome-wide recovery of DNase hypersensitive

sites. Proc. Natl. Acad. Sci. USA 101, 992–997.

Cuddapah, S., Jothi, R., Schones, D.E., Roh, T.Y., Cui, K., and Zhao, K. (2009).

Global analysis of the insulator binding protein CTCF in chromatin barrier

regions reveals demarcation of active and repressive domains. Genome

Res. 19, 24–32.

De Silva, N.M., and Frayling, T.M. (2010). Novel biological insights emerging

from genetic studies of type 2 diabetes and related metabolic traits. Curr.

Opin. Lipidol. 21, 44–50.

Dekker, J. (2003). A closer look at long-range chromosomal interactions.

Trends Biochem. Sci. 28, 277–280.

Dupuis, J., Langenberg, C., Prokopenko, I., Saxena, R., Soranzo, N., Jackson,

A.U., Wheeler, E., Glazer, N.L., Bouatia-Naji, N., Gloyn, A.L., et al. (2010). New

genetic loci implicated in fasting glucose homeostasis and their impact on type

2 diabetes risk. Nat. Genet. 42, 105–116.

ENCODE Project Consortium. (2007). Identification and analysis of functional

elements in 1% of the human genome by the ENCODE pilot project. Nature

447, 799–816.

Ferretti, V., Poitras, C., Bergeron, D., Coulombe, B., Robert, F., and

Blanchette, M. (2007). PReMod: a database of genome-wide mammalian

cis-regulatory module predictions. Nucleic Acids Res. 35 (Database issue),

D122–D126.

Gaulton, K.J., Nammo, T., Pasquali, L., Simon, J.M., Giresi, P.G., Fogarty,

M.P., Panhuis, T.M., Mieczkowski, P., Secchi, A., Bosco, D., et al. (2010).

A map of open chromatin in human pancreatic islets. Nat. Genet. 42, 255–259.

Girard, C., Duprat, F., Terrenoire, C., Tinel, N., Fosset, M., Romey, G., Lazdun-

ski, M., and Lesage, F. (2001). Genomic and functional characteristics of novel

human pancreatic 2P domain K(+) channels. Biochem. Biophys. Res.

Commun. 282, 249–256.

Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. (2007). FAIRE

(Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active

regulatory elements from human chromatin. Genome Res. 17, 877–885.

Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A.

(2007). A chromatin landmark and transcription initiation at most promoters

in human cells. Cell 130, 77–88.

Hakonarson, H., Qu, H.Q., Bradfield, J.P., Marchand, L., Kim, C.E., Glessner,

J.T., Grabs, R., Casalunovo, T., Taback, S.P., Frackelton, E.C., et al. (2008).

A novel susceptibility locus for type 1 diabetes on Chr12q13 identified by

a genome-wide association study. Diabetes 57, 1143–1146.

Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D.,

Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A., et al. (2007). Distinct and

predictive chromatin signatures of transcriptional promoters and enhancers

in the human genome. Nat. Genet. 39, 311–318.

Heintzman, N.D., Hon, G.C., Hawkins, R.D., Kheradpour, P., Stark, A., Harp,

L.F., Ye, Z., Lee, L.K., Stuart, R.K., Ching, C.W., et al. (2009). Histone modifi-

cations at human enhancers reflect global cell-type-specific gene expression.

Nature 459, 108–112.

Hesselberth, J.R., Chen, X., Zhang, Z., Sabo, P.J., Sandstrom, R., Reynolds,

A.P., Thurman, R.E., Neph, S., Kuehn, M.S., Noble, W.S., et al. (2009). Global

mapping of protein-DNA interactions in vivo by digital genomic footprinting.

Nat. Methods 6, 283–289.

Ingelsson, E., Langenberg, C., Hivert, M.F., Prokopenko, I., Lyssenko, V.,

Dupuis, J., Magi, R., Sharp, S., Jackson, A.U., Assimes, T.L., et al. (2010).

Detailed physiologic characterization reveals diverse mechanisms for novel

122

genetic Loci regulating glucose and insulin metabolism in humans. Diabetes

59, 1266–1275.

Itoh, Y., Kawamata, Y., Harada, M., Kobayashi, M., Fujii, R., Fukusumi, S., Ogi,

K., Hosoya, M., Tanaka, Y., Uejima, H., et al. (2003). Free fatty acids regulate

insulin secretion from pancreatic beta cells through GPR40. Nature 422,

173–176.

Joseph, B., Orlian, M., and Furneaux, H. (1998). p21(waf1) mRNA contains

a conserved element in its 3’-untranslated region that is bound by the Elav-

like mRNA-stabilizing proteins. J. Biol. Chem. 273, 20511–20516.

Joslin, E.P., and Kahn, C.R. (2005). Joslin’s diabetes mellitus, Fourteenth

Edition (Philadelphia, Pa.: Lippincott Williams & Willkins).

Jothi, R., Cuddapah, S., Barski, A., Cui, K., and Zhao, K. (2008). Genome-wide

identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic

Acids Res. 36, 5221–5231.

Kim, T.H., Abdullaev, Z.K., Smith, A.D., Ching, K.A., Loukinov, D.I., Green,

R.D., Zhang, M.Q., Lobanenkov, V.V., and Ren, B. (2007). Analysis of the verte-

brate insulator protein CTCF-binding sites in the human genome. Cell 128,

1231–1245.

Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128,

693–705.

Kutlu, B., Burdick, D., Baxter, D., Rasschaert, J., Flamez, D., Eizirik, D.L.,

Welsh, N., Goodman, N., and Hood, L. (2009). Detailed transcriptome atlas

of the pancreatic beta cell. BMC Med. Genomics 2, 3.

Li, C., Chen, P., Vaughan, J., Lee, K.F., and Vale, W. (2007). Urocortin 3 regu-

lates glucose-stimulated insulin secretion and energy homeostasis. Proc. Natl.

Acad. Sci. USA 104, 4206–4211.

Lynn, F.C. (2009). Meta-regulation: microRNA regulation of glucose and lipid

metabolism. Trends Endocrinol. Metab. 20, 452–459.

Lyssenko, V., Lupi, R., Marchetti, P., Del Guerra, S., Orho-Melander, M., Almg-

ren, P., Sjogren, M., Ling, C., Eriksson, K.F., Lethagen, A.L., et al. (2007).

Mechanisms by which common variants in the TCF7L2 gene increase risk of

type 2 diabetes. J. Clin. Invest. 117, 2155–2163.

Lyssenko, V., Nagorny, C.L., Erdos, M.R., Wierup, N., Jonsson, A., Spegel, P.,

Bugliani, M., Saxena, R., Fex, M., Pulizzi, N., et al. (2009). Common variant in

MTNR1B associated with increased risk of type 2 diabetes and impaired early

insulin secretion. Nat. Genet. 41, 82–88.

Magnuson, M.A. (1990). Glucokinase gene structure. Functional implications

of molecular genetic studies. Diabetes 39, 523–527.

Marson, A., Levine, S.S., Cole, M.F., Frampton, G.M., Brambrink, T., John-

stone, S., Guenther, M.G., Johnston, W.K., Wernig, M., Newman, J., et al.

(2008). Connecting microRNA genes to the core transcriptional regulatory

circuitry of embryonic stem cells. Cell 134, 521–533.

McDaniell, R., Lee, B.K., Song, L., Liu, Z., Boyle, A.P., Erdos, M.R., Scott, L.J.,

Morken, M.A., Kucera, K.S., Battenhouse, A., et al. (2010). Heritable individual-

specific and allele-specific chromatin signatures in humans. Science 328,

235–239.

Miele, A., and Dekker, J. (2008). Long-range chromosomal interactions and

gene regulation. Mol. Biosyst. 4, 1046–1057.

Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G.,

Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide

maps of chromatin state in pluripotent and lineage-committed cells. Nature

448, 553–560.

Muoio, D.M., and Newgard, C.B. (2008). Mechanisms of disease: molecular

and metabolic mechanisms of insulin resistance and beta-cell failure in type

2 diabetes. Nat. Rev. Mol. Cell Biol. 9, 193–205.

Mutskov, V., and Felsenfeld, G. (2009). The human insulin gene is part of a large

open chromatin domain specific for human islets. Proc. Natl. Acad. Sci. USA

106, 17419–17424.

Ohneda, K., Ee, H., and German, M. (2000). Regulation of insulin gene tran-

scription. Semin. Cell Dev. Biol. 11, 227–233.

Oler, A.J., Alla, R.K., Roberts, D.N., Wong, A., Hollenhorst, P.C., Chandler,

K.J., Cassiday, P.A., Nelson, C.A., Hagedorn, C.H., Graves, B.J., and Cairns,

B.R. (2010). Human RNA polymerase III transcriptomes and relationships to

Pol II promoter chromatin and enhancer-binding factors. Nat. Struct. Mol.

Biol. 17, 620–628.

Oliver-Krasinski, J.M., and Stoffers, D.A. (2008). On the origin of the beta cell.

Genes Dev. 22, 1998–2021.

Parker, S.C.J., Hansen, L., Abaan, H.O., Tullius, T.D., and Margulies, E.H.

(2009). Local DNA topography correlates with functional noncoding regions

of the human genome. Science 324, 389–392.

Phillips, J.E., and Corces, V.G. (2009). CTCF: master weaver of the genome.

Cell 137, 1194–1211.

Prokopenko, I., McCarthy, M.I., and Lindgren, C.M. (2008). Type 2 diabetes:

new genes, new understanding. Trends Genet. 24, 613–621.

Prokopenko, I., Langenberg, C., Florez, J.C., Saxena, R., Soranzo, N.,

Thorleifsson, G., Loos, R.J., Manning, A.K., Jackson, A.U., Aulchenko, Y.,

et al. (2009). Variants in MTNR1B influence fasting glucose levels. Nat. Genet.

41, 77–81.

Robertson, A.G., Bilenky, M., Tam, A., Zhao, Y., Zeng, T., Thiessen, N.,

Cezard, T., Fejes, A.P., Wederell, E.D., Cullum, R., et al. (2008). Genome-

wide relationship between histone H3 lysine 4 mono- and tri-methylation

and transcription factor binding. Genome Res. 18, 1906–1917.

Sabo, P.J., Hawrylycz, M., Wallace, J.C., Humbert, R., Yu, M., Shafer, A.,

Kawamoto, J., Hall, R., Mack, J., Dorschner, M.O., et al. (2004). Discovery of

functional noncoding elements by digital analysis of chromatin structure.

Proc. Natl. Acad. Sci. USA 101, 16837–16842.

Scacheri, P.C., Crawford, G.E., and Davis, S. (2006). Statistics for ChIP-chip

and DNase hypersensitivity experiments on NimbleGen arrays. Methods

Enzymol. 411, 270–282.

Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom,

K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolution-

arily conserved elements in vertebrate, insect, worm, and yeast genomes.

Genome Res. 15, 1034–1050.

Song, L., and Crawford, G.E. (2010). DNase-seq: a high-resolution technique

for mapping active gene regulatory elements across the genome from

mammalian cells. Cold Spring Harb. Protoc. 2010. 10.1101/pdb.prot5384.

Teich, N., Mossner, J., and Keim, V. (1998). Mutations of the cationic trypsin-

ogen in hereditary pancreatitis. Hum. Mutat. 12, 39–43.

Terazono, K., Yamamoto, H., Takasawa, S., Shiga, K., Yonemura, Y., Tochino,

Y., and Okamoto, H. (1988). A novel gene activated in regenerating islets.

J. Biol. Chem. 263, 2111–2114.

Tsuboi, T., and Rutter, G.A. (2003). Insulin secretion by ‘kiss-and-run’ exocy-

tosis in clonal pancreatic islet beta-cells. Biochem. Soc. Trans. 31, 833–836.

Wang, Z., Zang, C., Rosenfeld, J.A., Schones, D.E., Barski, A., Cuddapah, S.,

Cui, K., Roh, T.Y., Peng, W., Zhang, M.Q., and Zhao, K. (2008). Combinatorial

patterns of histone acetylations and methylations in the human genome. Nat.

Genet. 40, 897–903.

Xi, H., Shulha, H.P., Lin, J.M., Vales, T.R., Fu, Y., Bodine, D.M., McKay, R.D.,

Chenoweth, J.G., Tesar, P.J., Furey, T.S., et al. (2007). Identification and

characterization of cell type-specific and ubiquitous chromatin regulatory

structures in the human genome. PLoS Genet. 3, e136.

Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E.,

Nusbaum, C., Myers, R.M., Brown, M., Li, W., and Liu, X.S. (2008). Model-

based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.

123

Supplemental Information Cell Metabolism, Volume 12

Global Epigenomic Analysis of Primary Human Pancreatic Islets Provides Insights into Type 2 Diabetes Susceptibility Loci Michael L. Stitzel, Praveen Sethupathy, Daniel S. Pearson, Peter S. Chines, Lingyun Song, Michael R. Erdos, Ryan Welch, Stephen C.J. Parker, Alan P. Boyle, Laura J. Scott, NISC Comparative Sequencing Program, Elliott H. Margulies, Michael Boehnke, Terrence S. Furey, Gregory E. Crawford, and Francis S. Collins

Supplemental Experimental Procedures

DNase Mapping and Peak Calling

Twenty base pair reads were mapped to the reference genome, resulting in ~15, ~21 and

~37 million mappable, (no more than two mismatches) unique (only once in the genome), non-

satellite-repeat reads for each of the samples (Table S11).

MACS (Zhang et al., 2008) version 1.3.7.1 was used to identify genomic regions of

enrichment for mapped DNase-seq reads in the following manner. (1) Command line options:

MACS was run with the following options: --nomodel --shiftsize=1 --bw=70, (2) Handling

duplicate reads: because DNase-seq is expected to produce sequence reads that begin at precisely

the same base (“duplicates”), we modified MACS to count up to 6 “duplicate” reads per locus to

eliminate PCR artifacts, (3) Minimizing false positives: since no input control was generated for

DNase-Seq, MACS estimated the background noise ( local) using the DNase-Seq data from the

islet samples themselves. Depending on the genomic context, DHS can occur either in isolation

or in clusters. To account for this, MACS was run with two separate sets of parameter values for

the local noise correction, tuned for isolated (( local= max( 5000, 10000)) and clustered ( local=

max( 30000, 50000)) DHS. The union of the results from the two separate runs was designated as

the final set of MACS calls. Calls present in at least two of the three islet samples were defined

as “DHS peaks.”

124

H3K4me3 and CTCF Peak Identification

Genomic regions enriched for mapped sequence reads were determined using MACS.

Input was sequenced and used as a control. For CTCF, all default parameters were used, and for

H3K4me3, MACS was run with a modified set of parameter values ( local= max( 1000, 30000,

50000)) to account for its broader signal. The number of peaks called is found in Table S12. For

H3K4me3/CTCF peak identification in non-islet cell types, we processed the publicly available

raw data in a similar manner, with the caveat that in instances where input control was not

provided, MACS was run without a control.

Identification of Unannotated Islet-Active TSSs

DHS peaks tend to be punctate around the TSS, where as H3K4me3 peaks usually extend

well into the body of the transcribed unit. Therefore, we predicted the directionality of

unannotated TSSs using the following approach: First, the strongest DHS peak within each

H3K4me3 peak was determined according to the peak intensity value provided by MACS.

Second, the sequence length covered by the H3K4me3 peak on either side of the DHS peak was

computed. Third, the side with more sequence coverage was used to assign orientation to the

underlying TSS (i.e., more sequence coverage on the left side denotes minus strand transcription;

right side denotes plus strand transcription). When tested on TSSs of known, islet-active genes

with unidirectional TSSs, the algorithm performed at ~90% accuracy. Unannotated islet-active

TSSs were identified by applying this algorithm to unannotated H3K4me3 peaks and selecting

only those for which the predicted orientation is the same as the directionality of the H3K79me2

signal at that locus (Figure S2).

125

CTCF Motif Identification

MACS-identified CTCF peaks from the assay using the AbCam antibody were

intersected with those from the assay using the Millipore antibody. The top 10% (n=2130) of

intersected peaks were determined according to the sum of the -log10(p-values) provided by

MACS for each of the peaks from each assay. MEME (Bailey and Elkan, 1994) version 4.3.0

was used to identify motifs in these “top” CTCF peaks. MEME was run with '-mod zoops -dna -

revcomp' options, and the highest scoring motif was reported.

Histone Modification Enrichment/Depletion Analysis

For each of three islet samples, we computed the ratio of the density of extended

H3K4me1, H3K4me3 and H3K79me2 sequence reads in t-DHS, d-DHS, t-FAIRE and d-FAIRE

peaks to the density in flanking control regions that do not overlap any DHS/FAIRE signal. The

two flanking control regions selected for each DHS/FAIRE peak were 550-300 nucleotides

upstream of the 5’ DHS/FAIRE peak boundary (left flank) and 300-550 nucleotides downstream

of the 3’ DHS/FAIRE peak boundary (right flank). The density ratios were averaged across all

three islet samples for the overall average enrichment/depletion level. P-values of

enrichment/depletion were computed using the two-tailed paired Student’s t-test.

Identification of T2D Association Signal Boundaries (Spotter)

For each of the 18 T2D susceptibility loci, we identified the boundaries of the association

signal using a sliding window algorithm that takes into consideration linkage disequilibrium

(LD), recombination rates, and association p-values. Starting at the position of the most strongly

associated "index" SNP, we examined windows of 75kb in the 5’ and 3’ directions, scanning for

126

SNPs in LD (r2 0.5) with the index SNP or having an association p-value 10-5. After

identifying the most distant 5’ and 3’ window meeting either of these criteria, we selected the

nearest recombination hotspots ( 10 cM/MB) as the 5’ and 3’ boundaries. We examined the

intervals selected for each locus using the LocusZoom software, which generates plots showing

chromosomal position, association p-values, linkage disequilibrium patterns, and recombination

rate information. Each locus was visually inspected to ensure that the entire association signal

was contained within the selected interval. Our algorithm, implemented in the software package

Spotter, and the plotting tool LocusZoom are available online

(http://csg.sph.umich.edu/boehnke/spotter/ and http://csg.sph.umich.edu/locuszoom/).

Generating Gateway-Compatible Luciferase Vectors

To generate Gateway-compatible luciferase reporter vectors, Gateway cassette B (Invitrogen)

was ligated into an EcoRV site in the multiple cloning site of the pGL4.23 luciferase reporter

vector (Promega) in forward and reverse orientations. Gateway cassette orientation was

confirmed by restriction digest. The integrity of each Gateway-cloned insert was confirmed with

restriction enzyme digestion and/or direct sequencing.

RNA Preparation and TaqMan Expression Analysis in Human Islets

Two thousand islet equivalents (approximately 2 million cells) were harvested in Trizol

(Invitrogen), and total RNA was isolated using the RNeasy mini kit (Qiagen). 250 ng of RNA

from each sample was reverse transcribed using the high capacity RNA-to-cDNA kit (Applied

Biosystems). 12.5 ng of cDNA was used per TaqMan gene expression assay (Applied

Biosystems) per sample, and each assay was performed in triplicate. Expression was measured

127

using inventoried TaqMan gene expression assays for INS (Hs00355773_m1), GCG

(Hs00174967_m1), and SST (Hs00356144_m1). Relative transcript abundance was calculated

using the delta Ct method (Applied Biosystems) with a TaqMan gene expression assay for

GAPDH (Hs99999905_m1) serving as the normalization control. Serial 4-fold dilutions of total

pancreas cDNA ranging from 200 ng to 0.78125 ng were used to generate a standard curve and

assess TaqMan gene expression assay amplification efficiency; all assays were >99% efficient.

Cell Culture

HeLa cells were cultured in DMEM containing 10% FBS. MIN6 cells were cultured in DMEM

containing 10% FBS, 100 mM sodium pyruvate, and 100 M 2-mercaptoethanol. Cells were

maintained at 37 C and 5% CO2.

128

Figure S1. Comparison of Predicted Distal Regulatory Elements in the Human Islet among

Three Different Experimental Procedures

16,785 (7,929+4,713+4,143) out of 34,039 distal DHS peaks (d-DHS) overlap with GLITR

(Bhandare et al., 2010) and/or FAIRE (Gaulton et al., 2010) peaks.

130

-20

-15

-10

-5

0

5

10

A (50%)B (60%) Islet 1

C (60%) Islet 2

D (70%) E (75%) F (80%)Islet 3

G (80%) H (90%) I (90%)Islet 6

J (90%) TotalPancreas

HeLa Fibroblast

INSSSTGCG

Figure S2 (related to Figure 2)

131

Figure S2. Insulin (INS), Glucagon (GCG), and Somatostatin (SST) Genes Are Highly

Expressed in Human Pancreatic Islets from Cadaveric Donors

TaqMan gene expression assays were used to measure abundance of INS (Hs00355773_m1),

GCG (Hs00174967_m1), and SST (Hs00356144_m1) in 10 human islet samples (A-J).

Numbered islets indicate islets analyzed from Table S12. Islet purity is in parentheses.

Expression was determined by the delta Ct method using GAPDH (Hs99999905_m1) for

normalization. Values are represented on the log(2) scale. Total human pancreas is shown for

comparison, and HeLa and fibroblasts were used as negative controls.

132

6190

5599

5292

4060

2174

506

263

196

(109

8, 4

1)

(119

2, 5

0)

(1

179,

48)

(151

6,90

)

(170

6, 1

12)

(191

8, 1

38)

A B

C

Rem

ove

othe

r ann

otat

ed T

SSs

such

as

from

the

UC

SC K

now

n G

enes

trac

k

Rem

ove

“noi

se s

ubpe

aks”

Ove

rlap

with

DH

S pe

aks

foun

d in

a

t lea

st 2

out

of 3

repl

icat

es

Intr

agen

ic

Inte

rgen

ic

Ove

rlap

H3K

79m

e2 p

eaks

“Str

and

pred

icto

r” a

lgor

ithm

as

sign

s m

atch

ing

orie

ntat

ion

for

H3K

4me3

and

H3K

79m

e2 p

eaks

Ove

rlap

CpG

isla

nds

or

com

puta

tiona

lly p

redi

cted

TSS

s

1886

(1

207,

51)

Figu

re S

3 (r

elat

ed to

Fig

s 2

and

3)Pe

aks

not o

verla

ppin

g R

ef S

eq T

SSs

133

Figure S3. Identification of Unannotated, Intergenic, Islet-Active Transcription Start Sites

(A) Algorithm schematic. Red numbers in parentheses (x, y) next to each category indicate

average length (x) and intensity (y) of H3K4me3 peaks. The increase in average length and

intensity of H3K4me3 peaks as the algorithm proceeds provides increased confidence in the

strength of the putative TSS. “Noise subpeaks” refer to H3K4me3 peak calls that are

immediately adjacent to, overlap the same DHS, and thus likely represent the same signal as, a

larger H3K4me3 peak at a RefSeq TSS. The “strand predictor” algorithm independently predicts

the directionality of an H3K4me3 peak using DHS and H3K79me2 and assesses whether the

predictions match.

(B) Candidate islet-active transcription start site for the primary transcript of the islet-expressed

miR-1179/miR-7-2 microRNA cluster. The putative transcription start site [TSS, red box]

(DHS-enriched, H3K4me3-enriched, H3K4me1-depleted) is ~3.5 kb upstream of the 5’-most

microRNA (hsa-miR-1179) in the cluster, and the full-length primary transcript (H3K79me2-

enriched) is approximately ~7.5kb. Hepatocyte nuclear factor 1 (HNF1) has a predicted

conserved binding site within 5kb upstream of the putative TSS and regulatory factor X1 (RFX1)

has a predicted conserved binding site immediately adjacent to the TSS. The TSS is predicted to

be bidirectional according to the NHGRI BiPro dataset (http://genome.ucsc.edu/hgTables),

which is supported by substantial H3K79me2 signal on both sides of the TSS.

(C) Unannotated active promoter region harboring a type 1 diabetes associated variant. Variant

rs10876864 is in strong linkage disequilibrium (r2 > 0.6) with a published type 1 diabetes (T1D)

index SNP (rs1701704) (Hakonarson et al., 2008) and falls within a region that is a putative

unannotated islet-active transcription start site [TSS, red box] (DHS+, H3K4me3+). This may

represent a TSS of an unannotated transcript, or, an alternative TSS of the downstream gene

IKZF4, a strong candidate gene for T1D (Hakonarson et al., 2008). The annotated promoter of

IKZF4 lacks both a DHS and strong H3K4me3 peak (black box). Both the annotated (black box)

and the candidate (red box) TSS are highly sequence-conserved according to the “Mammal

Cons” track.

134

Figure S4 (related to Fig 3)A

B

135

Figure S4. Examples of Incorrectly Predicted Directionality of H3K4me3 Peaks

(A) Example of an incorrect prediction at the HMGN4 transcription start site due to a DNase I

hypersensitive site (DHS) that corresponds to a RNA polymerase III bound locus (blue box).

(B) Example of an incorrect prediction at the GAB2 transcription start site due to a DHS that

corresponds to a CTCF bound locus (red box).

136

Figu

re S

5 (r

elat

ed to

Fig

ure

5)

137

Figure S5. Representation of Histone Modifications at Distal DNase I Hypersensitive Sites

Distal DNase I hypersensitive sites (d-DHS) are split into four categories (intragenic, intergenic,

intragenic CTCF+, intergenic CTCF+). X-axis represents average (across three samples) read

density normalized to total number of reads. Y-axis represents average (across three samples)

fold enrichment of reads relative to flanking non-DHS control regions. H3K79me2 is not

expected to be enriched at DHS; thus it serves as a control test.

138

Figure S6 (related to Figure 6)A

B

139

Figure S6. Allele-Specific Luciferase Reporter Activity of Risk and Nonrisk Haplotypes

(A) IGF2BP2 (P9), KCNQ1 (P21), and FTO (P23) elements in MIN6 and HeLa. Risk and non-

risk alleles are indicated in Table S8. Data are represented as the mean +/- standard deviation of

3 replicates each from at least 2 independent clones for each haplotype.

(B) Novel variant in TCF7L2 element does not alter enhancer activity luciferase activity in

MIN6. Expanded data from Figure 6D, indicating relative luciferase activity of inserts containing

different genotypes at both the undocumented variant (1st nucleotide in the 2 nucleotide pair of

the legend) and rs7903146 (2nd nucleotide of the pair). Only variation at rs7903146 altered

enhancer activity (compare xT with xC) Data are represented as mean +/- standard deviation of 3

replicates each from at least 2 independent clones for each genotype. CC* was generated by site-

directed mutagenesis of the CT element.

140

Tab

le S

6.

Sp

otte

r-D

efin

ed S

earc

h S

pac

e at

18

T2

D-A

ssoc

iate

d L

oci

Sea

rch

Sp

ace

Ind

ex S

NP

C

hr

Pos

itio

n(h

g1

8)

p v

alu

e A

llele

sG

enes

/Lo

ci

Sta

rt

coor

din

ates

(h

g1

8)

End

coor

din

ates

(h

g1

8)

rs10

9239

31

chr1

12

0319

482

0.00

0006

862

G/T

AD

AM

30;N

OTC

H2

1201

4116

4 12

0430

267

rs12

7797

90

chr1

0 12

3680

16

0.00

0047

39

A/G

CAM

K1D

;CD

C12

3;N

UD

T5

1213

5047

12

3776

75

rs11

1187

5 ch

r10

9445

2862

3.

98E-

07

C/T

H

HEX

;KIF

11

9419

0340

94

4895

57

rs79

0314

6 ch

r10

1147

4833

9 3.

05E-

23

C/T

TC

F7L2

11

4707

461

1148

1341

6 rs

2237

892

chr1

1 27

9632

7 0.

0138

7 C/T

CD

KN

1C;K

CN

Q1;

KCN

Q1D

N;S

LC22

A18

;SLC

22A18

AS

2776

329

2815

122

rs52

15

chr1

1 17

3652

06

0.00

0000

41

C/T

ABCC8;

DKFZ

p686

O24

166;

KCN

J11;

NU

CB2

1695

7763

17

3812

87

rs79

6158

1 ch

r12

6994

9369

0.

0000

368

C/T

TS

PAN

8 69

6663

78

6995

3289

rs

8050

136

chr1

6 52

3737

76

0.00

0006

869

C/A

FT

O;R

PGRIP

1L

5235

5409

52

4060

62rs

1770

5177

ch

r17

3319

7639

0.

0028

26

T/A

HN

F1B;L

OC28

4100

3316

4509

33

2055

50

rs75

7859

7 ch

r2

4358

6327

0.

0001

087

T/C

THAD

A

4330

2162

43

9010

34

rs18

0128

2 ch

r3

1236

8125

0.

0002

032

C/G

G

STM

1L;P

PARG

1200

1672

12

8345

00

rs46

0710

3 ch

r3

6468

6944

0.

0003

129

C/T

AD

AM

TS9

6461

7740

64

8045

20

rs44

0296

0 ch

r3

1869

9438

1 7.

54E-

08

G/T

C3o

rf65

;IG

F2BP2

18

6754

224

1870

3137

7 rs

1001

0131

ch

r4

6343

816

0.00

4028

A/G

JA

KM

IP1;

PPP2

R2C

;WFS

1 63

1489

7 63

7598

7 rs

7754

840

chr6

20

7692

29

8.73

E-08

G

/C

CD

KAL1

20

5896

57

2112

0000

rs

8647

45

chr7

28

1470

81

0.00

0046

2 T/

C

JAZF1

2800

6441

28

2257

58

rs13

2666

34

chr8

11

8253

964

0.03

264

C/T

SLC

30A8

1181

1675

8 11

8306

152

rs10

8116

61

chr9

22

1240

94

1.95

E-07

T/

C

CD

KN

2BAS

2193

0588

22

1281

05

Chr

=ch

rom

osom

e

141

Tab

le S

8.

Elem

ents

Con

tain

ing

Typ

e 2

Dia

bet

es-A

ssoc

iate

d S

NP

s

C

lon

ed g

DN

A

Locu

s El

emen

t C

hr

SN

P

CTC

F R

isk

Alle

leN

on-r

isk

Alle

le

CAM

K1D

/CD

C12

3 --

ch

r10

rs11

2576

55Ye

s --

[T]

-- [

C]

FT

O

P23

chr1

6 rs

8050

136

No

A

C

chr1

6 rs

9935

401

No

AG

ch

r16

rs80

5159

1 N

o G

A

IGF2

BP2

P9

ch

r3

rs76

5109

0N

o G

A

ch

r3

rs64

4408

1 N

o C

T

chr3

rs

7646

518

No

CT

ch

r3

rs76

4053

9 N

o A

T

chr3

rs

7637

773

No

AG

KCN

Q1

P21

chr1

1 rs

1631

84a

No

G

T

TCF7

L2

P12

chr1

0 rs

7903

146

No

T C

WFS

1 P1

7 ch

r4

rs38

2194

3N

o T

C

chr4

rs

4689

397

No

AG

-- in

dica

tes

the

elem

ent

was

not

tes

ted

a Thi

s SN

P is

in h

igh

LD (

r2 =0.

98)

with

an

inde

x SN

P (r

s228

3228

; U

noki

et

al.,

200

8) in

the

Eas

t Asi

an p

opul

atio

n (H

apM

ap

JPT+

CH

B)

142

Tab

le S

9.

GW

AS

Cat

alog

SN

Ps

or L

inke

d S

NP

s (r

2>

0.6

) M

app

ing

wit

hin

5

00

bp

of

d-D

HS

D

isea

se/

trai

t C

hro

mos

ome

Ind

ex S

NP

Map

pin

gS

NP

Dp

rim

e r

squ

ared

R

epo

rted

gen

e C

TCF?

Type

1 d

iabe

tes

chr1

0 rs

1050

9540

rs11

8168

65

0.91

471

0.66

791

C10

orf5

9

Type

1 d

iabe

tes

chr4

rs

1051

7086

rs10

5170

86

1 1

Inte

rgen

ic

Ty

pe 1

dia

bete

s ch

r5

rs14

4589

8 rs

1737

6481

1

0.61

241

CAPS

L

Type

1 d

iabe

tes

chr1

4 rs

1465

788

rs19

4749

1

0.83

351

Inte

rgen

ic

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

7403

919

0.91

747

0.80

356

KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

9935

174

0.91

126

0.67

6 KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

1003

603

0.91

258

0.67

838

KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

7256

13

1 0.

9599

1 KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

1292

5642

1

0.72

209

KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

9929

994

1 0.

9593

2 KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

1292

4729

0.

9479

1 0.

6883

6 KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

1291

7656

0.

9506

1 0.

7364

9 KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

7203

459

0.89

148

0.61

388

KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

2903

692

1 1

KIA

A03

50

Ty

pe 1

dia

bete

s ch

r16

rs29

0369

2 rs

1767

3553

1

0.70

379

KIA

A03

50

Ty

pe 1

dia

bete

s ch

r19

rs42

5105

rs

4251

05

1 1

Inte

rgen

ic

Ty

pe 1

dia

bete

s ch

r19

rs42

5105

rs

1605

44

1 0.

7020

9 In

terg

enic

Type

1 d

iabe

tes

chr1

6 rs

4788

084

rs26

5049

2 0.

9115

6 0.

6153

4 IL

27

Ty

pe 1

dia

bete

s ch

r22

rs57

5303

7 rs

4117

6 0.

9661

9 0.

9329

2 In

terg

enic

Type

1 d

iabe

tes

chr5

rs

6897

932

rs93

1555

0.

8435

2 0.

6577

6 IL

7R

Ty

pe 1

dia

bete

s ch

r16

rs72

0287

7 rs

1333

1385

1

1 In

terg

enic

Type

1 d

iabe

tes

chr1

6 rs

7202

877

rs11

1498

12

0.92

395

0.64

991

Inte

rgen

ic

Ty

pe 1

dia

bete

s ch

r16

rs72

0287

7 rs

4993

971

0.92

395

0.64

991

Inte

rgen

ic

Ty

pe 1

dia

bete

s ch

r6

rs92

6864

5 rs

4547

48

0.80

972

0.63

434

MH

C

Yes

Type

1 d

iabe

tes

chr6

rs

9268

645

rs92

6852

8 0.

9636

0.

9283

7 M

HC

Yes

Type

1 d

iabe

tes

chr6

rs

9268

645

rs92

6860

5 1

1 M

HC

Ty

pe 1

dia

bete

s ch

r6

rs92

6864

5 rs

9268

606

1 1

MH

C

Yes

143

Type

1 d

iabe

tes

chr6

rs

9268

645

rs92

6860

7 1

1 M

HC

Ty

pe 1

dia

bete

s ch

r21

rs99

7676

7 rs

7276

630

1 0.

7981

U

BASH

3A

Ty

pe 1

dia

bete

s ch

r21

rs99

7676

7 rs

7278

547

1 0.

9647

6 U

BASH

3A

Ty

pe 2

dia

bete

s ch

r12

rs12

3049

21rs

1712

5346

0.

9284

7 0.

6707

5 N

R

Ty

pe 2

dia

bete

s ch

r10

rs12

7797

90rs

1125

7655

1

0.83

393

CD

C12

3,CAM

K1D

Yes

Type

2 d

iabe

tes

chr1

1 rs

2237

897

rs81

8158

8 1

0.70

186

KCN

Q1

Ty

pe 2

dia

bete

s ch

r11

rs22

3789

7 rs

2237

896

1 1

KCN

Q1

Ty

pe 2

dia

bete

s ch

r11

rs22

3789

7 rs

2237

897

1 1

KCN

Q1

Ty

pe 2

dia

bete

s ch

r3

rs44

0296

0 rs

6444

081

1 1

IGF2

BP2

Type

2 d

iabe

tes

chr3

rs

4402

960

rs76

4651

8 1

1 IG

F2BP2

Type

2 d

iabe

tes

chr4

rs

4689

388

rs46

8939

7 0.

9571

8 0.

8415

9 W

FS1,

PPP

2R2C

Type

2 d

iabe

tes

chr4

rs

4689

388

rs38

2194

3 0.

9572

2 0.

8440

5 W

FS1,

PPP

2R2C

Type

2 d

iabe

tes

chr2

rs

7578

597

rs17

0310

79

1 0.

8095

2 TH

AD

A

Ty

pe 2

dia

bete

s ch

r2

rs75

7859

7 rs

7559

723

1 0.

8181

8 TH

AD

A

Ty

pe 2

dia

bete

s ch

r2

rs75

7859

7 rs

1018

6307

1

0.81

818

THAD

A

Yes

Type

2 d

iabe

tes

chr2

rs

7578

597

rs10

1864

41

1 0.

8181

8 TH

AD

A

Yes

Type

2 d

iabe

tes

chr2

rs

7578

597

rs17

0311

33

1 0.

8181

8 TH

AD

A

Ty

pe 2

dia

bete

s ch

r2

rs75

7859

7 rs

6749

617

1 0.

7482

5 TH

AD

A

Ty

pe 2

dia

bete

s ch

r10

rs79

0314

6 rs

7903

146

1 1

TCF7

L2

Ty

pe 2

dia

bete

s ch

r16

rs80

5013

6 rs

1781

7288

1

0.64

722

FTO

Type

2 d

iabe

tes

chr1

6 rs

8050

136

rs11

0759

87

1 0.

6047

4 FT

O

Ty

pe 2

dia

bete

s ch

r6

rs94

7213

8 rs

9462

935

0.88

98

0.62

484

VEG

FA

Fa

stin

g pl

asm

a gl

ucos

e ch

r11

rs21

6670

6 rs

1083

0956

1

0.65

589

MTN

R1B

144

Tab

le S

10

. Is

let

Don

or C

har

acte

rist

ics

Sam

ple

ID

S

ex

Pu

rity

a

(%)

Via

bili

tya

(%)

BM

I A

ge

Cau

se o

f D

eath

b

Rac

ecIs

olat

ion

Sit

ed

Ap

pro

xim

ate

amou

nt

of

cros

slin

ked

mat

eria

l (I

slet

eq

uiv

alen

ts)e

Isle

t 1

F

60

91

25.2

58

CVH

AA

UM

N

1600

0 Is

let

2

F 60

99

30

.1

41

CVH

H

U

MN

16

000

Isle

t 3

M

70

93

26

.5

16

BH

T C

UAB

1800

0Is

let

4

F 80

97

24

37

U

C

U.

Ill.

1800

0 Is

let

5

M

85

95

24.7

36

SIG

SW

H

C

U.

Ill.

1600

0 Is

let

6

M

90

95

27.9

60

CVH

C

U.

Mia

mi

1600

0 Is

let

7

M

90

95

29.2

27

IC

H

H

Nor

thw

este

rnU

1400

0Is

let

8

F 80

90

22

.5

56

CV

A

C

ND

RI

1400

0Is

let

9

M

90

80

23.5

28

H

ead

Trau

ma

C

ND

RI

1400

0

a P

urity

(as

sess

ed b

y di

thiz

one

stai

ning

) an

d vi

abili

ty w

ere

dete

rmin

ed b

y is

let

dist

ribu

tion

cent

ers

b CVH

=Cer

ebro

vasc

ular

hem

orrh

age;

BH

T=bl

unt

head

tra

uma;

SIG

SW

H=

Sel

f-in

flict

ed g

unsh

ot w

ound

to

the

head

; I

CH

=In

trac

ereb

ral h

emor

rhag

e;CVA

=ce

rebr

ovas

cula

r ac

cide

nt (

stro

ke);

U=

undo

cum

ente

d/un

know

n

c AA=

Afr

ican

Am

eric

an;

H=

His

pani

c; C

=Cau

casi

an

d UM

N=

Uni

vers

ity o

f M

inne

sota

; U

AB=

Uni

vers

ity o

f Ala

bam

a-Birm

ingh

am;

U.I

ll.=

Uni

vers

ity o

f Illin

ois;

N

DRI=

Nat

iona

l Dis

ease

Res

earc

h In

terc

hang

e

e

1 is

let

equi

vale

nt =

~1,

000

cells

145

Tab

le S

11

. D

Nas

e-S

equ

enci

ng

Dep

th

Sam

ple

R

aw r

ead

s A

lign

ed

read

s A

fter

Rem

ove

Bla

cklis

ted

Un

iqu

e st

art

(6

p

ileu

p*

) U

niq

ue

star

t Is

let

7

2111

7592

14

7686

03

1469

9522

14

6621

61

9823

781

Isle

t 8

29

4979

84

2066

8595

20

5962

94

2042

4646

19

1906

93

Isle

t 9

52

7329

17

3782

8388

37

5986

62

3659

9434

14

9900

45

Bla

cklis

ted

regi

ons

incl

ude

repe

at r

egio

ns a

nd o

ther

in t

he U

CSC G

enom

e Bro

wse

r "D

uke

Excl

uded

Reg

ions

" Tr

ack

* U

p to

6 r

eads

with

the

sam

e 5'

end

are

incl

uded

in t

he a

naly

sis

base

d up

on t

he m

echa

nism

of

DN

ase

actio

n an

d si

mul

atio

ns

146

Tab

le S

12

. C

hIP

-Seq

Ep

itop

es U

sed

an

d S

equ

enci

ng

Dep

th

Sam

ple

Ep

itop

e/m

odif

icat

ion

Lan

esTo

tal

clu

ster

s A

lign

ed

read

s

No

Sat

ellit

e re

ads

Un

iqu

e st

arts

MA

CS

pea

ks

Isle

t 1

K

4m

e3

3 41

,722

,019

25

,032

,605

24

,137

,180

23

,127

,664

33

,260

Is

let

2

Inp

ut

3 49

,110

,219

31

,575

,093

31

,157

,610

30

,772

,095

Isle

t 2

G

FP4

67,2

00,0

31

35,9

28,7

43

35,3

14,1

45

17,3

29,1

33

1,80

4

Isle

t 2

K

4m

e3,

15

cyc

les

son

icat

ion

3

41,6

66,3

68

25,2

77,0

07

24,3

70,8

11

23,0

20,6

86

37,5

85

Isle

t 2

K

4m

e3,

20

cyc

les

son

icat

ion

3

25,6

21,4

75

15,5

30,3

84

14,8

49,0

23

14,0

22,6

98

31,5

52

Isle

t 3

In

pu

t 2

5,98

1,17

2 4,

183,

013

4,12

1,25

3 4,

077,

901

Is

let

3

GFP

1 8,

969,

443

4,24

0,82

4 4,

173,

037

859,

037

1,02

9 Is

let

3

K4

me1

2

26,3

23,7

90

20,8

53,7

61

20,7

06,1

46

20,4

64,3

48

21,6

35

Isle

t 3

K

79

me2

2

39,9

27,5

95

29,5

39,7

22

29,5

00,2

14

20,0

43,9

34

Is

let

4

Inp

ut

3 42

,697

,192

28

,006

,212

27

,476

,408

24

,770

,076

Isle

t 4

K

4m

e1

3 37

,340

,229

24

,802

,474

24

,505

,805

21

,085

,480

14

,305

Is

let

4

K4

me3

3

45,8

74,7

19

27,4

61,3

99

27,0

02,1

56

5,25

4,28

2 20

,066

Is

let

4

K7

9m

e2

3 41

,408

,208

28

,346

,402

28

,107

,409

21

,842

,999

Isle

t 5

In

pu

t 3

44,6

19,6

30

29,1

14,3

60

28,7

68,4

78

27,1

35,9

40

Is

let

5

GFP

1 7,

206,

897

3,99

9,18

8 3,

956,

623

2,32

0,07

9 2,

743

Isle

t 5

K

4m

e1

2 24

,941

,609

17

,090

,860

16

,961

,981

16

,821

,916

13

,926

Is

let

5

K4

me3

1

9,25

1,86

2 7,

437,

994

7,41

0,84

8 6,

436,

074

21,8

50

Isle

t 5

K

79

me2

2

24,2

78,8

95

17,3

37,9

70

4,54

3,54

6 4,

483,

407

Is

let

6

Inp

ut

2 33

,391

,762

21

,077

,573

20

,776

,285

19

,917

,514

Isle

t 6

G

FP1

23,5

42,6

87

14,4

17,6

52

14,1

65,7

43

3,02

0,96

5 8,

643

Isle

t 6

C

TCF

(Ab

Cam

) 1

10,5

99,3

76

7,68

4,57

3 7,

686,

285

6,32

7,93

4 37

,873

Is

let

6

CTC

F (M

illip

ore)

2

31,4

11,2

07

19,4

72,2

73

19,4

77,0

43

4,18

6,98

6 25

,778

147

Discussion

Future Directions

Conclusion

149

DISCUSSION Common polygenic diseases like T2D turn out to have a complex genetic architecture, with a very large number of risk variants, but only a very modest contribution from each. Given that reality, family linkage studies turned out to lack significant power to discover any but the strongest factors associated with T2D -- such as TCF7L2 (Grant et al., 2006). GWAS studies and expansive sampling of populations in meta-analysis studies have provided much greater power, and have been overwhelmingly successful at identifying increasing numbers of loci associated with T2D and T2D related traits. Despite these intense efforts, however, our functional understanding remains quite limited. First of all, it is challenging to identify the variant responsible for the functional consequence leading to T2D. In fact, due to the structure of the linkage disequilibrium at this level of resolution it is difficult to determine which gene at the associated locus is responsible for T2D. More than that, the possibility that a risk allele is actually affecting expression of a more distant gene that falls outside the region of linkage disequilibrium has to be seriously considered. Before tackling the functional challenge, however, it is important to outline additional approaches that are being taken to fill out the catalog of risk alleles for T2D and related traits. High resolution genetic mapping with increasing power: Given the remarkable success of GWAS approaches to catalog the wide array of genomic variants that contribute to T2D disease risk, it was desirable to increase the sample size even further. To achieve this, member of several consortia combined results to generate the custom designed Metabochip, containing almost 200,000 SNPs to fine map 257 loci of association defined by GWAS meta-analyses of 23 traits (Voight et al., 2012).

151

Table 1: Metabochip SNP selection

This custom genotyping array was made affordable in order for many studies to genotype hundreds of thousands of samples. The DIAGRAM consortium combined meta-analysis of their 12,171 T2D cases and 56,862 controls (imputed to 2.5 million autosomal SNPs) with an additional 22,669 T2D cases and 58,119 controls genotyped on the Metabochip. The result was identification of 10 additional T2D associate loci (Dimas et al., 2014). The Metabochip was also employed in the MAGIC Consortium to examine fasting glucose, fasting insulin and two-hour glucose in combined meta-analyses of 133,010, 108,557 and 42,854 subjects respectively. These analyses identified 53 loci associated with glycemic traits, 33 of which were also associated with T2D supporting the greater contribution of fasting glucose to T2D associated genes (R. A. Scott et al., 2012). Rare variant detection: GWAS strategies are generally limited to detection of loci with minor allele frequency of at least 1 – 2%. But there has been some suspicion that significant parts of the “missing heritability” for T2D and other common complex diseases might be due to rare alleles of large effect. Two large consortia have been formed to explore various sequencing strategies to identify such variants for T2D. The Type 2 Diabetes Genetic Exploration by Next-generation

152

sequencing in multi-Ethnic Samples (T2D-GENES) consortium is performing whole exome sequencing in five populations as well as whole genome sequencing in large multigenerational Mexican-American families. The Genetics of T2D (GoT2D) consortium is combining low coverage whole genome sequencing with deep coverage exome sequencing, high density genotyping and genotype imputation data on ~3000 European subjects to identify additional loci associated with T2D. These studies contributed preliminary sequencing data to the design of an economical exome based genotyping array to test rare variant associations in larger populations. (Grove et al., 2013) The goal of the Exome Chip design is to investigate rare and potentially more deleterious coding variations that affect protein structure, splicing and nonsense variants. Exome sequencing data for ~12,000 subjects assembled from 16 studies (table 2) was compiled for discovery of SNPs to the level of single observations. Table 2: Exome Sequencing study contribution to the Exome Chip content

From this data it was determined that the average genome can be expected to contain 8,000-10,000 nonsynonymous variants, 200-300 splice variants and 80-100 nonsense variants (‘Exome Chip Design - Genome Analysis Wiki’, http://genome.sph.umich.edu/wiki/ Exome_Chip_Design). Deep sequencing confirms common variants and identifies rare variants relevant to T2D Whole genome sequencing on a large cohort is not yet economically feasible. The information gained from this effort, while comprehensive, would present considerable challenges for interpretation, since rare non-coding variants will be found in every individual, but determining their functional significance can be extremely challenging. Targeted exonic and whole exome sequencing is significantly more affordable, and likely to discover coding variants (missense, nonsense, frameshift) that are much easier to interpret. The GCKR gene encoding the glucokinase regulatory protein (GKRP) harbors the common P446L variant (MAF=0.34) associated with increased triglyceride levels, C-reactive peptide and lower fasting glucose (Orho-Melander et al., 2008). GCKR was a candidate gene subjected to targeted exonic Sanger sequencing in the ClinSeq project where 19 rare variants were identified with a MAF < 0.02, most of which were novel (Biesecker et al., 2009). In vitro

153

examination demonstrated the spectrum of effects of these variants on GKRP ranged from loss of function, wild-type, to gain of function (Rees et al., 2012). These results emphasize the value of functional assays for variants effect on gene function. Functional analyses of rare variants detected in the candidate gene PPARG that inhibit adipocyte differentiation are associated with increased risk for T2D. In a large scale sequence analysis of PPARG in T2D cases and controls 49 rare nonsynonymous variants (MAF < 0.5%) were discovered. Only the common P12A variant was found at frequency > 1%. When these rare variants were evaluated in in vitro adipose differentiation assays 9 of these variants that demonstrated inhibition of the differentiation pathway were significantly associated with T2D risk (Majithia et al., 2014). Similarly, SLC30A8, which contains the common W325R variant associated with T2D risk, was subjected to targeted exon sequencing in a study of 115 genes near T2D signals, where 12 loss-of-function variants were detected that were shown to protect against diabetes (Flannick et al., 2014). It is important to note that although these rare loss-of-function events do not account for significant contribution to population prevalence of T2D the fact that suppression of SLC30A8 activity protects against diabetes indicates that this would be an excellent drug target. The genomics approach to functional annotation of disease variants As described above, it is notable that the vast majority of the T2D associated variants are not located in the coding regions of genes, but rather in the intergenic and intronic regions. This suggests that variants in regulatory elements, such as promoters or enhancers, affect the regulation of T2D associated genes and lead to disease susceptibility. To seek to determine the functional basis of non-coding T2D variants, we embarked on a study of chromatin structural analysis as a surrogate model of gene regulation. Using primary human pancreatic islets isolated from transplant candidates as a platform for understanding the regulation of gene expression in targets of T2D pathogenesis, we performed DNase hypersensitive site analysis (DNaseHS), as well as CTCF binding (to identify insulators) and histone H3 modification analysis by chromatin immunoprecipitation (ChIP) (Schmid & Bucher, 2007). Genome-wide, we have identified ~18,000 putative promoters identified by histone-3-lysine-4-trimethylation (H3K4me3), some of which were previously unannotated and active only in pancreatic islet cells. In addition we have identified 34,039 non-promoter regulatory elements, of which 22% are bound by CTCF as putative insulators and 47% are unique to pancreatic islets in comparison with other published studies (Ernst et al., 2011). For 18 T2D associated loci identified in the meta-analysis of the combined GWAS, we identified 118 putative regulatory elements in the neighborhood of those loci, and confirmed enhancer activity in 12 of 33 elements by in vitro luciferase assay and transgenic reporter mice (Stitzel et al., 2010). These putative regulatory elements are now being examined for correlation with gene expression in pancreatic islets. The goal is to identify a connection between the risk alleles and nearby differential gene expression as expression quantitative trait loci (eQTLs) (Battle & Montgomery, 2014).

154

FUTURE DIRECTIONS: The importance of functional analyses in disease relevant tissues Most variants associated with T2D are found in the non-coding regulatory regions of the genome. While the GWAS variants identify these loci they rarely are the actual functional variant but rather identify the haplotype where the true functional variant resides. To identify how T2D risk variants functionally contribute to disease it is critical to integrate all genetic and functionally relevant genomic data for the associated loci. To achieve this it is important to define genome wide epigenomic landscape in relevant tissues. Active parts of the genome are identified by open chromatin that has been traditionally detected by DNase hypersensitivity (DNase HS) assays. Further refinement of regulatory regions is defined by the combination of histone modifications present, which can indicate the location of promoters, enhancers and transcriptionally active genes as well as transcriptionally repressed genes. High resolution genotyping of the subject or tissue derived from the subject, sufficient to allow for accurate imputation, enables the determination of which T2D variants are found within the functional elements defined by the epigenomic landscape. RNA sequencing to sufficient depth to explore gene expression, transcript isoform deconvolution and allele-specific expression allows evaluation of T2D risk allele correlation with an eQTL in the region. All variants of the risk haplotype can be examined for interference with transcription factor binding both computationally and experimentally. Examination of the risk allele effect on gene expression can indicate the potential of the gene as a therapeutic target. Assessment of variants in the context of the chromatin structure of tissues specifically implicated in T2D; pancreatic islets, liver, skeletal muscle, and adipose where tissue specific enhancers have been discovered by chromatin immunoprecipitation experiments in conjunction with expression QTL analysis leads to identification of target genes in associated regions. To complement the epigenomic analysis of pancreatic islets we continue to collect islet tissues and genotype them on Illumina arrays containing 2.5 million SNPs to ascertain their load of T2D and T2D related alleles. Total RNA is extracted and strand-specific RNA sequencing (RNA-Seq) is performed to a depth of 100 million paired-end reads, sufficient to attempt transcript isoform deconvolution to investigate differences in tissue specific transcript representation in islets. The combination of genotype and gene expression enables gene expression quantitative trait loci (eQTL) analysis associated with T2D and T2D related trait associated alleles. We will also examine T2D associated variants with alternative splicing quantitative trait loci (sQTLs). Recently we have also been performing the assay for transposase-accessible chromatin using sequencing (ATAC-seq) to assess open chromatin structure similar to DNaseHS analysis, but with sufficient resolution to identify transcription factor footprints (Buenrostro, Giresi, Zaba, Chang, & Greenleaf, 2013). This will be sufficient to employ phylogenetic module complexity analysis, PMCA, (Claussnitzer et al., 2014) where conserved co-ocurring transcription factor binding sites (TFBS) are identified in several species are examined to systematically identify cis-regulatory variants at GWAS loci (Figure 1).

155

Figure 1. Allelic and cross-species chromatin state signatures at the PDX1 locus. (A) Tn5 transposase (green) inserts sequence adaptors (red and blue) in regions of open chromatin (between nucleosomes in gray) to generate ATAC-seq libraries. Schematic taken from Buenrostro et al. (B) UCSC genome browser view of the human PDX1 locus showing chromatin state maps for 31 cell types and transcription maps for 3 cell types (other cell types lack gene expression and appear similar to HepG2 and GM12878). EndoC and islet chromatin state maps are similar to each other but remarkably different from other cell-types, indicating the cell-type specificity of this locus. Note the fasting glucose-related trait GWAS SNP in the stretch enhancer specific to EndoC and proximate to the stretch enhancer in islets. Nearby the GWAS SNP, there are two SNPs (circled) with significant allelic bias in EndoC H3K27ac ChIP-seq data. (C) H3K27Ac allelic bias in EndoC. The circled points represent two SNPs in the PDX1 locus. P-values are based on a Binomial test with an expectation of 0.5. The highly symmetric pattern around the vertical line at 0.5 indicates that our allelic bias pipeline accounts for reference bias. (D) Chromatin state and transcription maps of PDX1 in mouse insulinoma cells (MIN6). The similarity of chromatin and expression maps between MIN6 and EndoC/islets suggests that cross-species ATAC-seq maps could identify important TF binding modules.

Allele-specific expression quantitative trait loci (aseQTL) will be evaluated where transcribed SNPs (tSNPs) are present in the RNA transcript in a region a putative regulatory SNP (rSNP) is located. This can be evaluated where phase is not known (Battle et al., 2014) or where phase can be assessed (Lappalainen et al., 2013). Pancreatic islet tissue is not accessible in living individuals, but other tissues relevant to diabetes can be studied in vivo. In order to address the translation of the genetic association with T2D to the functional cause of the disease, we have begun a study of the integrated analysis of genotype, gene expression and phenotype on the genetic background of subjects

156

from the FUSION and Metabolic Syndrome in Man (METSIM) study in Finland (Stancakova et al., 2009). We have sampled skeletal muscle and subcutaneous adipose tissue from 324 Finnish individuals, including 125 normal glucose tolerant (NGT), 41 impaired fasting glucose (IFG), 72 impaired glucose tolerant (IGT) and 86 newly diagnosed T2D subjects. All have had their disease status confirmed by oral glucose tolerance test prior to the biopsy. RNA sequencing (RNA-Seq) and microRNA sequencing (miRNA-Seq) are being performed on total RNA extracted from these tissues to document gene expression. All subjects are being genotyped on high density arrays including all SNPs previously associated with T2D, in order to evaluate eQTLs, sQTLs and aseQTLs as previously described. (Figure 2) In addition, DNA methylation is planned to look for association with disease or quantitative traits. All of these subjects already have extensive phenotype information from study records.

Figure 2. T2D associated eQTL in muscle RNA-Seq expression analysis. SNP association with gene expression identifies probable gene associated with T2D. POU5F1 and TCF19 genes are identified as candidates for the association by proximity to the T2D association signal . Association with gene expression indicates eQTL with CCHCR1 suggesting this may be a more plausible candidate. In the longer term, we hope to study other tissues derived in vitro from these same individuals, taking advantage of recent scientific developments. Primary patient fibroblast cell lines are also being established to use for induced pluripotent stem (iPS) cell line generation, to investigate the effects of different genetic backgrounds on development of relevant tissues by differentiation of the pluripotent lines toward adipocyte, muscle cell, hepatocyte and pancreatic beta cell lineages (Takahashi et al., 2007).

157

These studies may represent important strides forward to investigate interaction of the genetic background with the functional consequences, and should assist in identifying the most promising therapeutic targets. CONCLUSION The increase in the incidence of T2D throughout the world compels the need to understand the disease etiology better, to develop strategies that might slow the trend of increasing incidence of T2D, and to identify new therapeutic approaches. Progress in the last seven years has been breathtaking, as GWAS studies of common variants have contributed significantly to identifying a host of candidate susceptibility loci for T2D. Increasing study subject size and sensitivity for less common alleles has allowed the identification of additional variants that contribute to the heritability of T2D. But the functional understanding of these variants, and the translation of those insights into therapeutic opportunities, presents the most significant current challenge. With the tools now being developed and applied, there is no question that this challenge will be met.

158

Acknowledgements

159

ACKNOWLEDGEMENTS

Foremost I would like to thank my mentors, Cisca Wijmenga, Marten Hofker and Francis Collins for this opportunity and their support and encouragement along the way. I would like to extend my heartfelt appreciation to Francis for his unending support and guidance throughout my career at NHGRI. I would also like to acknowledge the many members of Francis’s lab, past and present, that have shared their creativity and continue to do so. In addition, I wish to acknowledge the FUSION group in its entirety from Helsinki, Finland to NHGRI to UNC to USC and Cedar Sinai and finally to the University of Michigan for their camaraderie and interaction contributing to this work and for engaging the international collaborations leading to the DIAGRAM and MAGIC consortia and the world effort to understand and challenge T2D.

161

References

163

REFERENCES

Battle, A., & Montgomery, S. B. (2014). Determining causality and consequence of expression quantitative trait loci.

Human Genetics, 133(6), 727–735. doi:10.1007/s00439-014-1446-0

Battle, A., Mostafavi, S., Zhu, X., Potash, J. B., Weissman, M. M., McCormick, C., … Koller, D. (2014).

Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.

Genome Research, 24(1), 14–24. doi:10.1101/gr.155192.113

Biesecker, L. G., Mullikin, J. C., Facio, F. M., Turner, C., Cherukuri, P. F., Blakesley, R. W., … Green, E. D. (2009).

The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine. Genome

Research, 19(9), 1665–1674. doi:10.1101/gr.092841.109

Bonnycastle, L. L. (2006). Common Variants in Maturity-Onset Diabetes of the Young Genes Contribute to Risk of

Type 2 Diabetes in Finns. Diabetes, 55(9), 2534–2540. doi:10.2337/db06-0178

Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y., & Greenleaf, W. J. (2013). Transposition of native

chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome

position. Nature Methods, 10(12), 1213–1218. doi:10.1038/nmeth.2688

Burton, P. R., Clayton, D. G., Cardon, L. R., Craddock, N., Deloukas, P., Duncanson, A., … Worthington, J. (2007b).

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature,

447(7145), 661–678. doi:10.1038/nature05911

CDC - National Diabetes Statistics Report, 2014 - Publications - Diabetes DDT. (n.d.). Retrieved 6 September 2014,

from http://www.cdc.gov//diabetes/pubs/statsreport14.htm

Claussnitzer, M., Dankel, S. N., Klocke, B., Grallert, H., Glunk, V., Berulava, T., … Laumen, H. (2014). Leveraging

Cross-Species Transcription Factor Binding Site Patterns: From Diabetes Risk Loci to Disease Mechanisms.

Cell, 156(1-2), 343–358. doi:10.1016/j.cell.2013.10.058

DeFronzo, R. A. (2009). From the Triumvirate to the Ominous Octet: A New Paradigm for the Treatment of Type 2

Diabetes Mellitus. Diabetes, 58(4), 773–795. doi:10.2337/db09-9028

Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of

BioMedical Research, Saxena, R., Voight, B. F., Lyssenko, V., Burtt, N. P., de Bakker, P. I. W., … Purcell, S.

(2007a). Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels.

Science, 316(5829), 1331–1336. doi:10.1126/science.1142358

165

Dimas, A. S., Lagou, V., Barker, A., Knowles, J. W., Magi, R., Hivert, M.-F., … on behalf of the MAGIC

Investigators. (2014). Impact of Type 2 Diabetes Susceptibility Variants on Quantitative Glycemic Traits

Reveals Mechanistic Heterogeneity. Diabetes, 63(6), 2158–2171. doi:10.2337/db13-0949

Ernst, J., Kheradpour, P., Mikkelsen, T. S., Shoresh, N., Ward, L. D., Epstein, C. B., … Bernstein, B. E. (2011).

Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 473(7345), 43–49.

doi:10.1038/nature09906

Exome Chip Design - Genome Analysis Wiki. (n.d.). Retrieved 7 September 2014, from

http://genome.sph.umich.edu/wiki/Exome_Chip_Design

Flannick, J., Thorleifsson, G., Beer, N. L., Jacobs, S. B. R., Grarup, N., Burtt, N. P., … Altshuler, D. (2014). Loss-of-

function mutations in SLC30A8 protect against type 2 diabetes. Nature Genetics, 46(4), 357–363.

doi:10.1038/ng.2915

Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., … Stewart, J. (2007). A second

generation human haplotype map of over 3.1 million SNPs. Nature, 449(7164), 851–861.

doi:10.1038/nature06258

Ghosh, S., Watanabe, R. M., Hauser, E. R., Valle, T., Magnuson, V. L., Erdos, M. R., … Kohtamaki, K. (1999). Type

2 diabetes: evidence for linkage on chromosome 20 in 716 Finnish affected sib pairs. Proceedings of the

National Academy of Sciences, 96(5), 2198–2203. Retrieved from http://www.pnas.org/content/96/5/2198.short

Gibbs, R. A., Belmont, J. W., Hardenbol, P., Willis, T. D., Yu, F., Yang, H., … others. (2003). The international

HapMap project. Nature, 426(6968), 789–796. Retrieved from

http://www.nature.com/nature/journal/v426/n6968/abs/nature02168.html

Grant, S. F. A., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., … Stefansson, K.

(2006). Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genetics,

38(3), 320–323. doi:10.1038/ng1732

Grarup, N., Sandholt, C. H., Hansen, T., & Pedersen, O. (2014). Genetic susceptibility to type 2 diabetes and obesity:

from genome-wide association studies to rare variants and beyond. Diabetologia. doi:10.1007/s00125-014-

3270-4

Grove, M. L., Yu, B., Cochran, B. J., Haritunians, T., Bis, J. C., Taylor, K. D., … Boerwinkle, E. (2013). Best

Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium. PLoS ONE, 8(7),

e68095. doi:10.1371/journal.pone.0068095

166

Gusella, J. F., Wexler, N. S., Conneally, P. M., Naylor, S. L., Anderson, M. A., Tanzi, R. E., … Sakaguchi, A. Y.

(1983). A polymorphic DNA marker genetically linked to Huntington’s disease. Nature, 306(5940), 234–238.

Kahn, S. E., Cooper, M. E., & Del Prato, S. (2014). Pathophysiology and treatment of type 2 diabetes: perspectives on

the past, present, and future. The Lancet, 383(9922), 1068–1083. Retrieved from

http://www.sciencedirect.com/science/article/pii/S0140673613621546

Kaprio, J., Tuomilehto, J., Koskenvuo, M., Romanov, K., Reunanen, A., Eriksson, J., … Kesäniemi, Y. A. (1992).

Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a

population-based cohort of twins in Finland. Diabetologia, 35(11), 1060–1067. Retrieved from

http://link.springer.com/article/10.1007/BF02221682

Lappalainen, T., Sammeth, M., Friedländer, M. R., ‘t Hoen, P. A. C., Monlong, J., Rivas, M. A., … Dermitzakis, E. T.

(2013). Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 501(7468),

506–511. doi:10.1038/nature12531

Majithia, A. R., Flannick, J., Shahinian, P., Guo, M., Bray, M.-A., Fontanillas, P., … Zollner, S. (2014). Rare variants

in PPARG with decreased activity in adipocyte differentiation are associated with increased risk of type 2

diabetes. Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1410428111

Orho-Melander, M., Melander, O., Guiducci, C., Perez-Martinez, P., Corella, D., Roos, C., … Kathiresan, S. (2008).

Common Missense Variant in the Glucokinase Regulatory Protein Gene Is Associated With Increased Plasma

Triglyceride and C-Reactive Protein but Lower Fasting Glucose Concentrations. Diabetes, 57(11), 3112–3121.

doi:10.2337/db08-0516

Rees, M. G., Ng, D., Ruppert, S., Turner, C., Beer, N. L., Swift, A. J., … Collins, F. S. (2012). Correlation of rare

coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and

kinetic outcomes. Journal of Clinical Investigation, 122(1), 205–217. doi:10.1172/JCI46425

Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol, J. M., Stein, L. D., Marth, G., … International SNP Map

Working Group. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide

polymorphisms. Nature, 409(6822), 928–933. doi:10.1038/35057149

Schaid, D. J., & Sommer, S. S. (1993). Genotype relative risks: methods for design and analysis of candidate-gene

association studies. American Journal of Human Genetics, 53(5), 1114. Retrieved from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1682319/

Schmid, C. D., & Bucher, P. (2007). ChIP-Seq data reveal nucleosome architecture of human promoters. Cell, 131(5),

831–832.

167

Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., … Boehnke, M. (2007). A Genome-

Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants. Science,

316(5829), 1341–1345. doi:10.1126/science.1142382

Scott, R. A., Lagou, V., Welch, R. P., Wheeler, E., Montasser, M. E., Luan, J., … Barroso, I. (2012). Large-scale

association analyses identify new loci influencing glycemic traits and provide insight into the underlying

biological pathways. Nature Genetics, 44(9), 991–1005. doi:10.1038/ng.2385

Silander, K., Scott, L. J., Valle, T. T., Mohlke, K. L., Stringham, H. M., Wiles, K. R., … Boehnke, M. (2004). A large

set of Finnish affected sibling pair families with type 2 diabetes suggests susceptibility loci on chromosomes 6,

11, and 14. Diabetes, 53(3), 821–829.

Spencer, C., Hechter, E., Vukcevic, D., & Donnelly, P. (2011). Quantifying the Underestimation of Relative Risks

from Genome-Wide Association Studies. PLoS Genetics, 7(3), e1001337. doi:10.1371/journal.pgen.1001337

Stancakova, A., Javorsky, M., Kuulasmaa, T., Haffner, S. M., Kuusisto, J., & Laakso, M. (2009). Changes in Insulin

Sensitivity and Insulin Release in Relation to Glycemia and Glucose Tolerance in 6,414 Finnish Men. Diabetes,

58(5), 1212–1221. doi:10.2337/db08-1607

Stitzel, M. L., Sethupathy, P., Pearson, D. S., Chines, P. S., Song, L., Erdos, M. R., … Collins, F. S. (2010). Global

Epigenomic Analysis of Primary Human Pancreatic Islets Provides Insights into Type 2 Diabetes Susceptibility

Loci. Cell Metabolism, 12(5), 443–455. doi:10.1016/j.cmet.2010.09.012

Stumvoll, M., Goldstein, B. J., & van Haeften, T. W. (2005). Type 2 diabetes: principles of pathogenesis and therapy.

The Lancet, 365(9467), 1333–1346. doi:10.1016/S0140-6736(05)61032-X

Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., & Yamanaka, S. (2007). Induction of

Pluripotent Stem Cells from Adult Human Fibroblasts by Defined Factors. Cell, 131(5), 861–872.

doi:10.1016/j.cell.2007.11.019

The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 437(7063), 1299–

1320. doi:10.1038/nature04226

Thorisson, G. A., & Stein, L. D. (2003). The SNP Consortium website: past, present and future. Nucleic Acids

Research, 31(1), 124–127. doi:10.1093/nar/gkg052

Tsui, L.-C., Buchwald, M., Barker, D., Braman, J. C., Knowlton, R., Schumm, J. W., … others. (1985). Cystic fibrosis

locus defined by a genetically linked polymorphic DNA marker. Science, 230(4729), 1054–1057. Retrieved

from http://www.sciencemag.org/content/230/4729/1054.short

168

Tuomi, T., Santoro, N., Caprio, S., Cai, M., Weng, J., & Groop, L. (2014). The many faces of diabetes: a disease with

increasing heterogeneity. The Lancet, 383(9922), 1084–1094. Retrieved from

http://www.sciencedirect.com/science/article/pii/S0140673613622199

Valle, T., Tuomilehto, J., Bergman, R. N., Ghosh, S., Hauser, E. R., Eriksson, J., … others. (1998). Mapping Genes

for NIDDM: Design of the Finland—United States Investigation of NIDDM Genetics (FUSION) Study.

Diabetes Care, 21(6), 949–958. Retrieved from http://care.diabetesjournals.org/content/21/6/949.short

Voight, B. F., Kang, H. M., Ding, J., Palmer, C. D., Sidore, C., Chines, P. S., … Boehnke, M. (2012). The

Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric

Traits. PLoS Genetics, 8(8), e1002793. doi:10.1371/journal.pgen.1002793

WHO | Diabetes programme. (n.d.). Retrieved 6 September 2014, from http://www.who.int/diabetes/en/

Yen, C.-J., Beamer, B. A., Negri, C., Silver, K., Brown, K. A., Yarnall, D. P., … Shuldiner, A. R. (1997). Molecular

Biochemical and Biophysical Research

Communications, 241(2), 270–274. doi:10.1006/bbrc.1997.7798

Zeggini, E., Scott, L. J., Saxena, R., Voight, B. F., Marchini, J. L., Hu, T., … Altshuler, D. (2008). Meta-analysis of

genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2

diabetes. Nature Genetics, 40(5), 638–645. doi:10.1038/ng.120

Zeggini, E., Weedon, M. N., Lindgren, C. M., Frayling, T. M., Elliott, K. S., Lango, H., … Hattersley, A. T. (2007).

Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes.

Science, 316(5829), 1336–1341. doi:10.1126/science.1142364

169

Summaries

171

SUMMARY Type 2 diabetes (T2D) affects over 340 million people worldwide. T2D predominantly affects low- and middle-income countries accounting for more than 80% of the deaths due to diabetes. The world prevalence of T2D is over 8% and costing over US$612 billion annually. Nearly 50% of people living with diabetes are undiagnosed. Identifying the causes contributing to risk for type 2 diabetes (T2D) has been a formidable challenge for decades. Evidence for genetic factors in T2D risk includes the observation of a 3.5-fold increased incidence for first degree relatives of T2D subjects compared to the general middle–aged population. In the Finnish population, where our studies have primarily been focused, the T2D concordance in monozygotic twins is ~34% compared to ~16% in dizygotic twins, supporting a significant hereditary contribution. Complicating the identification of T2D-associated genetic variants are lifestyle and environmental factors that play a major role in disease onset and progression. Poor diet and lack of exercise can contribute significantly to susceptibility to T2D. Thus T2D is an excellent example of a common complex polygenic disease. The FUSION (Finnish US Investigation of NIDDM (Non-Insulin Dependent Diabetes Mellitus)) genetics study is an international collaboration with the goal of identifying genetic variants contributing for T2D susceptibility. Families were originally selected based on index cases with age of onset 35-60 years, and with at least one affected sibling. Unaffected spouses and offspring were also ascertained for frequently sampled intravenous glucose tolerance tests (FSIGTs) to allow estimates of glucose- and insulin-related physiological traits. In addition, a cohort of elderly individuals over 65 years of age with normal glucose tolerance was collected as control subjects. This thesis focuses on the identification of the genetic basis of T2D by scanning individual genes as well as complete genomes for variations in genes, termed single nucleotide polymorphisms (SNPs), that are associated with genetic loci that may predispose to T2D. The first approach we applied was candidate gene association analysis. In these analyses, plausible genes are selected by specific criteria that imply these genes may play a role in the T2D disease process, such as, involvement in glucose regulation, pancreatic islet function, regulation of insulin action or interaction with a particular therapeutic agent. Candidate gene sequencing of the peroxisome proliferator- activated receptor-γ2 (PPARG2) gene, which regulates adipocyte differentiation and is a well know target of T2D therapeutic thiazoladinediones identified a SNP coding for the amino acid change, proline to alanine, at position 12 of the protein (P12A). Several candidate gene studies resulted in ambiguous results due to varying sample size and population characteristics affecting statistical power of association. In chapter 1 we performed candidate SNP association in the FUSION cohort, which revealed a protective effect of the P12A variant of the gene with significantly lower allele frequency in diabetics. To accelerate the ability to analyze candidate SNPs, in chapter 2, we devised a method of performing SNP association studies comparing quantitative allele frequency differences in T2D case and control DNA pools. We successfully applied this to the

173

identification of a T2D associated SNP located near the pancreatic beta-cell promoter of the hepatocyte nuclear factor-4 alpha (HNF4A) gene, a gene known to cause a rare monogenic form of diabetes, maturity-onset diabetes of the young (MODY). The second approach we employed became possible with the advent of more sophisticated technologies allowing for higher multiplex single nucleotide polymorphism (SNP) analysis enabling genome-wide association studies (GWAS) to identify novel disease associated genes. In chapter 3 we performed GWAS by evaluation of 315,635 SNPs in 1161 Finnish T2D and 1174 Finnish normal glucose tolerant (NGT) control individuals which identified multiple interesting signals but unfortunately did not achieve requisite statistical significance. However, we confirmed the known T2D association with the T-cell factor 7-like 2 (TCF7L2) gene. Recognizing the need for more statistical power we compared our results with those of two other GWAS, the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC), and followed with stage 2 analysis of 82 SNPs that showed promising evidence of association. The combined FUSION, DGI, and WTCCC results led to the identification of T2D-associated variants at four novel loci and confirmed previously associated variants near the genes TCF7L2, SLC30A8, HHEX, PPARG, and KCNJ11. In a subsequent collaboration of unprecedented size, the combined meta-analysis of these three initial GWAS studies (comprising 8,130 T2D cases and 32,987 controls), with five additional GWAS studies contributing an additional 34,412 T2D cases and 59,925 controls, identified 12 novel T2D-associated loci. In chapter 4 we also applied GWAS analysis to T2D related physiological data collected from Finnish and Sardinian non-diabetic individuals analyzed as quantitative traits and found significant association within the G6PC2/ABCB11 locus for fasting glucose. Meta-analysis of all known fasting glucose GWAS determined the association responsible for an increase in fasting glucose from 0.01-0.16 mM with each copy of the major allele, which accounts for approximately 1% of the total variation in fasting glucose. Discriminating the causal effect at this locus is difficult due to the high linkage disequilibrium between G6PC2 and ABCB11. Arguments can be made for both genes as G6PC2, a glucose-6-phosphatase, is almost exclusively expressed in pancreatic islets while ABCB11, an ATP binding cassette family member is expressed in the liver where it may also contribute to variation in glucose regulation. Similarly, in chapter 5, we combined fasting glucose of ten GWAS of individuals of European descent to discover the first T2D trait association in the melatonin receptor 1B (MTNR1B) gene that was also found to be significantly associated with T2D. In this case we were able to demonstrate functional relevance where MTNR1B was found to be expressed in pancreatic islets with increased expression in islets that were homozygous for the risk allele. These differences in expression were more profound in islets of people >45 years of age. In addition, melatonin suppressed insulin secretion in in vitro cell culture studies. Our third strategy to determine the specific genes with biologically relevant function in T2D associated loci applied the analysis of chromatin structure as a surrogate model of gene regulation. In chapter 6, using primary human pancreatic islets isolated from transplant candidates we performed DNase hypersensitive site analysis (DNaseHS), as well as CTCF binding and histone H3 modification analysis by chromatin immunoprecipitation (ChIP) to

174

identify ~18,000 putative promoters, some of which are unannotated and active only in pancreatic islet cells, and 34,039 nonpromoter regulatory elements, of which 47% are unique to pancreatic islets. We examined the chromatin structural characteristics of the 18 T2D associated loci identified in the meta-analysis of the combined GWAS and identified 118 putative regulatory elements and confirmed in vitro gene enhancer activity in a subset of these elements. In the last chapter we discuss efforts to increase power to detect T2D association, application of sequencing to identify rare variant association, and the importance of functional assays to validate genes at these T2D associated loci. We introduce our integrative analyses of T2D and related quantitative trait association, genome structure annotation and gene expression in muscle and adipose tissue biopsies from 324 additional Finnish T2D and non-diabetic subjects collected in our studies to identify new potential genes and pathways for targeted therapeutic development leading to specific treatments for this heterogeneous complex disease.

175

SAMENVATTING Type 2 diabetes (T2D) komt wereldwijd voor bij 340 miljoen mensen. Landen met een laag of gemiddeld inkomen dragen voor meer dan 80% bij aan de sterfte veroorzaakt door T2D. Wereldwijd lijdt 8% van de bevolking aan T2D en dit kost de gemeenschap 612 miljard Amerikaanse dollars per jaar. Ondanks ruim 25 jaar onderzoek aan T2D is het nog steeds een enorme uitdaging om er achter te komen wat de oorzaken zijn die leiden tot een verhoogd risico op het krijgen van deze ziekte. Het is inmiddels duidelijk dat genetische factoren een rol spelen omdat T2D 3½ keer meer voorkomt bij mensen met een eerstegraads verwant met T2D. Onze genetische studies hebben zich vooral gericht op de Finse bevolking. Daar zien we dat 34% van de monozygote tweelingen concordant zijn voor het voorkomen van T2D, terwijl dit bij slechts 16% van de dizygote tweelingen het geval is. Dit wijst op een duidelijke erfelijke aanleg voor de ontwikkeling van T2D. Echter, naast de genetische aanleg is er bij het ontstaan en progressie van T2D een grote rol voor leefstijl en omgevingsfactoren. Een slecht voedingspatroon en te weinig beweging draagt in hoge mate bij tot de kans T2D te ontwikkelen. Daarmee is T2D een uitstekend voorbeeld van een veel voorkomende complexe aandoening waarbij meerdere genen betrokken zijn. De FUSION (Finnish US Investigation of NIDDM (Non-Insulin Dependent Diabetes Mellitus)) studie is een internationaal samenwerkingsverband met als doel het ophelderen van de genetische varianten die bijdragen tot een verhoogde gevoeligheid voor T2D. De families werden in eerste instantie geselecteerd op de aanwezigheid van een patiënt die tussen de 35 en 60 jaar werd gediagnosticeerd met T2D en die ten minste 1 aangedane eerstegraad verwant heeft. Niet aangedane verwanten en kinderen werden gediagnosticeerd met behulp van een glucose tolerantie test om te beoordelen of er sprake was van een glucose of insuline gerelateerde aandoening. Als controle populatie is er een cohort van gezonde personen van 65 jaar en ouder gebruikt. Dit proefschrift focust op de ontrafeling van de genetische factoren van T2D door zowel de genetische variatie bij individuele genen als ook complete genomen te bestuderen. Het doel is om “single nucleotide polymorphisms” (SNPs) te vinden, die geassocieerd zijn met de genetische loci die T2D veroorzaken. De eerste benadering die we hebben toegepast is de associatie studie met kandidaat genen. De kandidaat genen hebben we gekozen op basis van ons huidige mechanistische inzicht in het ziekte proces, waaronder genen betrokken bij de glucose huishouding, Bètacel functie in de pancreas, regulatie van de insuline werking, en genen die interactie vertonen met sommige geneesmiddelen. Een van de kandidaat genen die we hebben gesequenced was het gen dat codeert voor “peroxisome-proliferator-activated receptor-gamma 2 (PPARG2). PPARG2 reguleert de differentiatie van vetcellen en wordt door thiazoladinediones geactiveerd als therapie voor T2D. Met behulp van DNA sequentie analyse is in dit gen een SNP gevonden die leidt tot een aminozuur verandering van proline naar alanine op aminozuur positie 12 (P12A). Verschillende andere kandidaat genen lieten wisselende resultaten zien. Dit werd veroorzaakt door verschillen in populatie grootte en eigenschappen, waardoor de statistische

176

berekeningen niet reproduceerbaar bleken. In hoofdstuk 1 worden de resultaten van de P12A variant beschreven. De P12A variant beschermt tegen het voorkomen van T2D en dragers van de P12A variant komen minder vaak voor in de T2D populatie. In hoofdstuk 2 wordt een aanpak beschreven om op een voordelige manier grote aantallen kandidaat SNPs te bestuderen. Daarbij zijn pools gemaakt van het DNA van T2D patiënten en van gezonde controle personen. Op deze manier is een met T2D geassocieerde SNP gevonden in de promoter van het gen voor “hepatocyte-nulcear factor-4 alpha (HNF4A). Eerder was al beschreven dat mutaties in het HNF4A gen een zeldzame monogene vorm van diabetes veroorzaken, namelijk “maturity-onset diabetes of the young (MODY). De tweede benadering werd mogelijk met de komst van meer geavanceerde technologieën voor de analyse van veel meer SNPs tegelijkertijd. In hoofdstuk 3 wordt een genoom-wijde associatie studie (GWAS) uitgevoerd, waarbij tegelijkertijd 315.635 SNPs worden geanalyseerd in 1161 Finse T2D patiënten en 1174 Finse gezonde controle individuen met een normaal glucose metabolisme. Met deze studie werden verschillende interessante associaties van SNPs met T2D gevonden, maar deze associaties waren niet statistisch significant. Wel kon de associatie tussen T-cell factor 7-like 2 (TCF7L2) en T2D worden bevestigd. Omdat meer statistische power noodzakelijk was, hebben wij vervolgens onze resultaten vergeleken met 2 andere GWAS studies, namelijk de “Diabetes Genetics Initiative” (DGI) en de “Wellcome trust Case Control Consortium” (WTCCC) studie. Hierop volgde een 2de analyse met 82 SNPs die mogelijk geassocieerd waren met T2D. De gecombineerde FUSION, DGI en WTCCC resultaten hebben geleid tot de identificatie van SNPs op nog eens vier verschillende loci en bevestigde de associatie van TCF7L2, SLC30A8, HEX, PPARG en KCNJ11. Vervolgens kwam een samenwerking van een ongeëvenaarde omvang tot stand, waarbij de gecombineerde meta-analyse van de 3 GWAS studies werd uitgevoerd. Dit betrof een studie met 8.130 T2D patiënten en 32.987 controles. Aan deze studie werden uiteindelijk nog 5 extra studies toegevoegd, zodat de complete studie kon worden gedaan met 34.412 T2D patiënten en 59.925 controles. Deze grootschalige studie leverde maar liefst 12 nieuwe T2D geassocieerde loci op (Hoofdstuk 3). In hoofdstuk 4 hebben we eveneens een GWAS analyse toegepast op T2D gerelateerde kenmerken van Finse en Sardische (Italië) gezonde personen. Hierbij vonden wij een significante associatie tussen het G6PC2/ABCB11 locus en gevaste bloed glucose waarden. Een meta-analyse van alle op dat moment bekende GWAS studies met dit locus liet zien dat het allel verantwoordelijk is voor de verhoging van de glucose waarde van 0.01 – 0.16 mM. Daarbij is een kopie van het major allel verantwoordelijk voor ongeveer 1% van de totale variatie in de gevaste glucose waarde. Het is in dit geval niet mogelijk om te voorspellen of het verantwoordelijke gen G6PC2 is of ABCB11 omdat deze beide genen een sterke mate van linkage disequilibrium vertonen. Beide genen zouden qua functie een rol kunnen spelen bij dit fenotype. G6PC2 is een glucose-6-fosfatase, en komt vrijwel uitsluitend tot expressie in de bètacel in de pancreas. ABCB11 behoort tot de familie van ATP binding cassette genen en komt tot expressie in de lever waar het een mogelijke rol kan hebben bij de variatie in glucose regulatie. Tevens, in hoofdstuk 5, hebben we ontdekt dat het melatonin receptor 1B (MTNR1B) gen geassocieerd is met T2D. Hierbij werd gebruik gemaakt van 10 Europese GWAS studies en gekeken naar de gevaste glucose waarden. Het MTNR1B gen is ook functioneel relevant. Het gen komt tot expressie in de eilandcellen in de

177

pancreas, waarbij de expressie van MTNR1B bij homozygoten voor het risicoallel verhoogd was. Deze verschillen in gen expressie waren meer geprononceerd in mensen van 45 jaar en ouder. Daarnaast hebben we gevonden dat melatonine de insuline secretie onderdrukt in in vitro studies. Onze derde benadering om specifieke T2D genen te vinden in de geassocieerde loci richtte zich op het analyseren van de chromatine structuur als afgeleide van de gene regulatie. In hoofdstuk 6 hebben wij “DNAse hypersensitive site analysis” (DNaseHS) en “chromatin immunoprecipitation” (ChIP) analyse toegepast en 18.000 promoters in humane eilandcellen kunnen identificeren. Een aantal van deze promoters zijn voor het eerst beschreven en zijn alleen actief in eiland cellen. Daarnaast hebben we 34.039 regulatoire elementen gevonden die geen onderdeel uitmaken van een promoter. Ongeveer 47% van deze elementen komen uniek voor in eilandcellen. Met deze informatie werden 18 T2D geassocieerde loci onderzocht. Ten minste 118 van de regulatoire elementen liggen in de T2D loci en een deel van de regulatoire elementen vertonen enhancer activiteit bij in vitro studies. In het laatste hoofdstuk wordt bediscussieerd hoe de genetische associatie studies voor T2D beter kunnen worden uitgevoerd. Ook bespreek ik het gebruik van sequencing om zeldzame varianten die associëren met T2D op de sporen. Daarnaast ga ik in op het belang van functioneel onderzoek om de genen te valideren die genoemd worden in de associatie studies. Ten slotte introduceer ik het voorstel voor een meer integratieve benadering. Die is inmiddels al ingezet door van 324 Finse T2D en gezonde personen spierweefsel en vetweefsel af te nemen. Met dit materiaal is een integratieve benadering mogelijk, door zowel de genetica, de genoom structuur en de gen expressie te bestuderen. Van deze aanpak mag worden verwacht dat deze leidt tot inzicht in nieuwe potentiële genen, nieuw mechanistisch inzicht en uiteindelijk betere behandeling voor mensen met deze complexe en heterogene ziekte.

178

Short biography

Publications

179

SHORT BIOGRAPHY Michael Erdos was born in New Brunswick, New Jersey on February 10, 1956. He attended Purdue University from 1974 to 1978 majoring in chemistry and completed his Bachelor of Science degree in Biochemistry at The George Washington University in 1981. Michael continued research at The George Washington University under Dr. Allan Goldstein studying the immune regulation by circulating thymic peptides, thymosin α1 and thymosin β4.

In 1990, he joined the laboratory of Dr. Warren Leonard in the National Institute of Child Health and Human Development at the National Institutes of Health where he studied molecular biology interleukin 2 receptor signaling. In 1993 he joined Dr. Francis Collins in creating the laboratory infrastructure for the newly established National Center for Human Genome Research after which he continued research in Dr. Collins lab in the BRCA1 positional cloning effort. In 1997 Michael transitioned to complex disease genetics joining the Finnish US Investigation of NIDDM (FUSION) genetics study where he initiated single nucleotide polymorphism association studies in the newly designated National Human Genome Research Institute.

Michael is currently a Senior Staff Scientist in the Collins laboratory focusing on translational research of type 2 diabetes from association studies to functional analysis and therapeutic target identification. He also contributes to the study of Hutchinson Gilford progeria syndrome (HGPS) and the design and implementation of preclinical trials to support the identification of potential therapeutics for HGPS patients. PUBLICATIONS Naylor PH, Erdos MR, and Goldstein AL, (1984) Increased thymosin levels associated with acquired immune deficiency syndrome (AIDS). "Thymic hormones and lymphokines: Basic chemistry and clinical applications", (A.L.Goldstein, ed.) Plenum Press, N.Y., p. 69-76.

Otani H, Erdos M, and Leonard WJ. Tyrosine kinase(s) regulate apoptosis and bcl-2 expression in a growth factor-dependent cell line. J. Biol. Chem. 1993; 268(30):22733-6

Castilla LC, Couch FJ, Erdos MR, Hoskins KF, Calzone K, Garber JE, Boyd J, Lubin MB, Deshano ML, Brody LC, Collins FS, and Weber BL. Mutations in the BRCA1 gene in early-onset breast and ovarian cancer. Nat. Genet. 1994; 8: 387-91.

Eriksson M, Brown WT, Gordon LB, Glynn MW, Singer J, Scott L, Erdos MR, Robbins CM, Moses TY, Berglund P, Dutra A, Pak E, Durkin S, Csoka AB, Boehnke M, Glover TW, Collins FS. Recurrent de novo point mutations in lamin A cause Hutchinson-Gilford progeria syndrome. Nature. 2003 May 15; 423(6937):293-8.

Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007 Jun 1;316(5829):1341-5.

Parker SC, Stitzel ML, Taylor DL, Orozco JM, Erdos MR, Akiyama JA, van Bueren KL, Chines PS, Narisu N; NISC Comparative Sequencing Program, Black BL, Visel A, Pennacchio LA, Collins FS; National Institutes of Health Intramural Sequencing Center Comparative Sequencing Program Authors; NISC Comparative Sequencing Program Authors. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc Natl Acad Sci U S A. 2013 Oct 29;110(44):17921-6.

181

university of groningen genetic etiology of type 2 ... · the aim of this thesis on the etiology of...

Documents