university of groningen genetic etiology of type 2 ... · the aim of this thesis on the etiology of...
TRANSCRIPT
University of Groningen
Genetic etiology of type 2 diabetesErdos, Mike
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.
Document VersionPublisher's PDF, also known as Version of record
Publication date:2015
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):Erdos, M. (2015). Genetic etiology of type 2 diabetes: from gene identification to functional genomics. [S.l.]:[s.n.].
CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.
Download date: 24-05-2020
Genetic Etiology of Type 2 Diabetes:From Gene Identification to Functional Genomics
Michael Reynolds Erdos
ISBN
978-90-367-7595-3 (e-book) 978-90-367-7596-0
Cover figure
Graphical representation from gene identification to functional genomics depicts the identification of the CDKAL1 association with type 2 diabetes in the background manhattan plot of genome wide association transitioning to functional confirmation demonstrating intrachromosomal contacts of physically associated chromatin domains between CDKAL1and SOX4 genes by chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). The foreground illustrates the predicted model of the locus of transcription involving pancreatic islet specific stretch enhancers and the CDKAL1 and SOX4 genes. Credits: Ernesto del Aguila and Darryl Leja, Intramural Publications Support Office, National Human Genome Research Institute.
Genetic Etiology of Type 2 DiabetesFrom Gene Identification to Functional Genomics
PhD thesis
to obtain the degree of PhD at the University of Groningen on the authority of the
Rector Magnificus Prof. E. Sterken and in accordance with
the decision by the College of Deans.
This thesis will be defended in public on
Wednesday18 March 2015 at 14.30 hours
by
Michael Reynolds Erdos
born on 10 February 1956 in New Jersey, United States of America
SupervisorsProf. C. Wijmenga Prof. M.H. Hofker Prof. F.S. Collins
Assessment committeeProf. M.G. Netea Prof. H. Snieder Prof. B.H.R. Wolffenbuttel
Table of contents
Preface 7
Introduction 13
Chapter 1 The PPAR- 2 Pro12Ala variant: association with type 2 diabetes and trait differences 23
Chapter 2 High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools 31
Chapter 3 A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants 41
Chapter 4 Variations in the G6PC2/ABCB11 genomic region are associated with fasting glucose levels 81
Chapter 5 Common variant in MTNR1B associated with increased risk of type 2 diabetes and impaired early insulin secretion 97
Chapter 6 Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci 109
Discussion, Future Directions and Conclusion 149
Acknowledgements 159
References 163
Summaries 171
Short Biography and Publications 179
Preface
PREFACE
The aim of this thesis on the etiology of type 2 diabetes (T2D) is to describe efforts to define the genetic basis of the disease as a model for understanding the nature of complex disease, where there are many genes involved in contributing to the disease state at varying levels of penetrance as well as strong environmental impact. This journey began with gene discovery efforts in both familial and population-based studies, and continues with functional approaches to elucidate the biological mechanisms by which identified genetic loci influence disease risk. Future directions aimed toward realizing the goal of effective preventive, diagnostic and treatment modalities are described. The thesis consists of seven chapters. Chapter 1 describes the use of single nucleotide polymorphism (SNP) analysis in candidate gene studies to validate a non-synonymous coding polymorphism (P12A) in the peroxisome proliferation activating receptor gamma (PPARG2) gene. Chapter 2 presents a unique, high throughput and cost effective method of fine mapping analysis developed to rapidly screen candidate genomic regions of association by genotyping pooled DNA samples of case versus control subjects. With the advent of massively high throughput parallel genotyping arrays, Chapter 3 introduces genome-wide association study (GWAS) analysis on the Finnish-US Investigation of NIDDM Genetics (FUSION) cohort to identify loci associated with type 2 diabetes (T2D). In Chapters 4 and 5we present genetic associations with quantitative trait loci (QTL) that influence fasting glucose and insulin secretion to investigate mechanisms underlying the genes associated with T2D, adding a functional approach to support the association. To elucidate the potential causal genes in regions associated with T2D and T2D related quantitative traits, Chapter 6introduces the novel approach of assessing histone methylation states in pancreatic islets, the primary defective tissue in T2D, to identify chromatin structural domains that correlate with functional elements of noncoding regions of the genome including promoters, enhancers, transcribed genes and repressed genes. Finally, I discuss the current status of genetic association studies for T2D and efforts to refine regions of association by high-densitycustom genotyping and whole exome / genome sequencing. While making significant advances in the knowledge of the genetics of T2D, the identification of these largely non-coding risk variants does not readily lead to clear conclusions about which genes are actually affected, and whether the risk alleles lead to overexpression, underexpression, or misexpression in timing or tissue localization. Through the integration of T2D SNP association studies with epigenomic approaches that can delineate functional elements in noncoding genomic regions, and with ongoing gene expression analyses in T2D relevant tissues from patient samples, I describe an approach to determine the functional consequences of non-coding T2D risk alleles, and thereby ultimately to identify plausible therapeutic targets in genes and molecular pathways,
9
10
DEDICATION
There are many influences that contribute to achievement. The most significant are opportunity and support. I greatly appreciate the opportunity and support of many special mentors, colleagues, and friends throughout the years. Most significant is the encouragement and patience of Csilla Szabo without whose support I would never have had this opportunity of achievement.
11
12
Introduction
13
14
INTRODUCTION
Type 2 diabetes (T2D) affects over 347 million people worldwide, predominantly affecting low- and middle-income countries, and accounts for more than 80% of the total deaths due to diabetes. Nearly 1% of T2D affected people die each year. In 2005, the World Health Organization projected diabetes-related mortality would double by 2030 (‘WHO | Diabetes programme’, 2014).
In the United States over 8% of the population above 20 years of age have been diagnosed with T2D, with associated medical care costing over $174 billion annually. Reports of type 2 diabetes in children in the past were previously rare, but have increased worldwide as the prevalence of childhood obesity has been climbing. In some countries, it accounts for almost half of newly diagnosed cases in children and adolescents. If these trends continue, over 30% of adults in the United States will be diagnosed with T2D by the year 2050 (Figure 1) (‘CDC - National Diabetes Statistics Report, 2014 - Publications - Diabetes DDT’, 2014).
Figure 1. Prevalence of obesity and diabetes in the United States in 1994 and 2010. The US Center of Disease Control (CDC) presents statistical data for the prevalence of obesity and T2D by US state. The Finnish US Investigation of NIDDM (FUSION) study was initiated in 1994.
15
Type 2 diabetes results from the inability to effectively regulate glucose levels in the blood. T2D primarily affects metabolic tissues in the body and manifests as resistance to insulin action in muscle, liver and adipose tissues. Under normal physiological conditions the pancreatic islets secrete insulin to induce glucose uptake, predominately in the muscle, and influence glucose disposal by conversion to storage in other peripheral tissues such as liver and adipose. As glucose levels rise in the body the pancreatic islets compensate in response by increasing the amount of insulin secreted at a rate constant described as the disposition index. Repeated exposure to high levels of glucose results in overburdening the secretory response in the pancreatic islets. This leads to progressive stress on pancreatic beta-cells, a failure to compensate for the high glucose and insulin resistance in peripheral tissues, and ultimately results in beta-cell failure (Figure 2). By the time a person is diagnosed with T2D they have lost ~80% of their beta-cell function. More recent evidence indicates roles for other tissues in T2D including incretin deficiency in the gastrointestinal tract, hyperglucagonemia of the pancreatic islet alpha cells, increased glucose resorption in the kidney, and insulin resistance in the brain, indicating that the physiological changes in the development of T2D are much more complicated than previously perceived (DeFronzo, 2009).
Figure 2. Pathophysiology of type 2 diabetes. Increased circulating glucose and free fatty acids usually results in increased secretion of insulin, which regulates glucose production in the liver, increases glucose uptake by skeletal muscle and reduces free fatty acid release in adipose. Genetic predisposition and environmental factors leading to increased hyperglycemia and circulating free fatty acids resists insulin action and leads to beta cell toxicity, decreased insulin production and increased insulin resistance. Reviewed in (Stumvoll, Goldstein, & van Haeften, 2005).
16
Multiple lines of evidence support a significant hereditary contribution to T2D risk. There is a 3.5-fold increased incidence for first degree relatives of T2D subjects compared to the general middle–aged population. In the Finnish population, where our studies have primarily been focused, the T2D concordance in monozygotic twins is ~34% compared to ~16% in dizygotic twins (Kaprio et al., 1992). Nevertheless, identifying genetic variants affecting risk for type 2 diabetes (T2D) has been a formidable challenge for decades, complicated by lifestyle and environmental factors that play a major role in disease onset and progression (Tuomi et al., 2014). Thus, T2D is a prominent example of a common complex polygenic disease.
Gene Discovery – Linkage Analysis:Initially, complex disease studies were modeled after extremely successful familial genetic linkage studies such as those that identified the genes for Huntington’s disease (Gusella et al., 1983), Cystic Fibrosis (Tsui et al., 1985), and others. The FUSION (Finnish US Investigation of NIDDM) genetics study is an international collaboration with the goal to identify genetic variants contributing to T2D susceptibility. Families were originally selected in 1994 (Valle et al., 1998)based on index cases with age of onset 35-60 years, and with at least one affected sibling. Unaffected spouses and offspring were also ascertained for frequently sampled intravenous glucose tolerance tests (FSIGTs) to allow estimates of glucose- and insulin-related physiological traits. In addition, a control cohort of elderly individuals greater than 65 years of age with normal glucose tolerance was collected (Table 1).
Table 1: FUSION Study population characteristics:
Genome wide linkage analysis results were reported in 2000 using simple tandem (triplet and tetrad) repeat (STRs) polymorphic microsatellite markers to examine shared genetic regions between affected sibling pairs (ASPs). These identity by descent (IBD) analyses suggested regions linked to T2D on chromosomes 20, 14, 11, and 6 (Ghosh et al., 1999; Silander et al., 2004). While these analyses were instrumental in locating regions potentially linked with T2D the resolution of the linked regions was far too large to implicate specific disease genes.
17
The aim of the studies described herein is to refine the investigation of T2D genetics to the resolution of the gene, and to identify gene networks and molecular pathways responsible for T2D that might lead to the potential development of therapeutics for better disease management and prevention of associated complications.
Gene Discovery – Candidate Gene Association Studies:Large scale single nucleotide polymorphism (SNP) discovery in the 1990s enabled higher resolution case-control association studies capable of increasing resolution potentially to the level of the gene (Sachidanandam et al., 2001). By genotyping individual SNPs found in genes selected by specific criteria that were suspected to predispose to disease in T2D and normal subjects, differences in allele frequency between T2D and normal populations could be statistically tested for association to disease. Looking for association at the level of the single nucleotide may suggest that the gene being queried is implicated in the disease process (Schaid & Sommer, 1993).Typical candidate genes include PPARG2, a known target of the T2D therapeutic thiazolidinediones (Yen et al., 1997), and genes that were known to cause rare monogenic forms of T2D as in Maturity Onset Diabetes in the Young (MODY) genes (Bonnycastle, 2006). My own work in the candidate gene phase contributed to discovery of T2D associations with PPARG2 and HNF4A.
Gene Discovery – Genome-Wide Association Studies:Large collaborative efforts such as the SNP Consortium formed in 1999 accelerated SNP discovery to the point to enable an increasing number of biologically plausible candidate gene analyses, comparing allele frequency differences between case and control groups (Thorisson & Stein, 2003). The International HapMap Project, initiated in 2002, successfully catalogued most of the 10 million common SNPs in the human genome shared within and between African, Asian, and European populations (Gibbs et al., 2003). But scanning the whole genome for sites of variation associated with disease risk did not require testing all of those SNPs. Comparing genetic variation between large numbers of different people has identified regions of chromosomes where the variants are shared in a non-random way, referred to as “linkage disequilibrium”. Within these regions, which vary in size from a few kb to hundreds of kb, common SNP alleles tend to travel in lockstep with their neighbors, forming a haplotype (The International HapMap Consortium, 2005; Frazer et al., 2007). Genotyping any of the common variants in the shared segment imparts the same genetic information. This enabled the genotyping of far fewer than 10 million SNPs to assess the genetic association of these haplotypes with common disease. Large scale SNP discovery along with the advent of the HapMap project and the ability to perform massively parallel SNP genotyping expanded the ability to perform association studies to cover the whole genome. Genome wide association studies (GWAS) were undertaken by FUSION (L. J. Scott et al., 2007) and other large T2D collaborations (Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research et al., 2007; Burton et al., 2007; E. Zeggini et al., 2007). The statistical correction required for the increasing number of tests in GWAS made it clear that larger numbers of subjects would be required to undertake association studies at the genome-wide scale.
18
In order to overcome this penalty of multiple testing, increasing numbers of cases and controls were necessary. To achieve this aim, FUSION, the Broad Institute Diabetes Genetics Initiative (DGI), and the Wellcome Trust Case Control Consortium (WTCCC) agreed to combine results in a meta-analysis of each individual GWAS resulting in the identification of 11 loci associated with T2D at genome wide significance. These efforts subsequently led to the formation of the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium encompassing most of the largest case control studies with GWAS worldwide (Zeggini et al., 2008). The increasing numbers of studies joining the consortium contributed to significantly increased power allowing for the investigation of more rare alleles (Table 2).
Table 2: Summary of sample sets and SNPs assessed in the meta-analysis and replication of the DIAGRAM Consortium
In addition to diabetes affected status, many of these studies have collected considerable quantitative traits data, enabling the formation of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) (Dupuis, et. al. 2010 and Table 3) to investigate putative T2D-associated genetic loci by examining glucose and insulin related traits in unaffected subjects. This meta-analysis quickly identified 12 loci associated with fasting glucose related traits, but surprisingly little for insulin related traits, suggesting that T2D is much more heavily influenced by beta cell function than by insulin resistance.
19
Table 3. Summary information for studies compiled for the Meta-Analysis of Glucose and Insulin-related traits Consortium.
Although these unprecedented collaborative efforts resulted in increasing the discovery of T2D associated loci, they have thus far failed to capture more than a small portion of the heritability of the disease (Spencer, Hechter, Vukcevic, & Donnelly, 2011).
Type 2 diabetes GWAS and related quantitative traits have identified over 90 loci with genome wide significance in association with T2D and an even larger number of loci associated with obesity measures and with glucose and insulin related quantitative traits (Figure 3) (Grarup, Sandholt, Hansen, & Pedersen, 2014). The vast majority of these risk loci are, however, located in non-coding regions, suggesting that their effects are moderated by affected timing or level of gene expression. Given that effects of such non-coding functional elements (enhancers, insulators) can occur over long distances, the identification of the actual predisposing gene has only beenidentified in a few instances. Although risk loci are often named by the nearest gene, or the most plausible candidate, most of the associated loci contain many genes, and the gene chosen for locus labeling may have little or no supporting evidence for being functionally relevant.
20
Figure 3: Venn diagram of GWAS loci associated with T2D and T2D related quantitative traits. Intersection of genome-wide significant associations between T2D and five commonly measured T2D-related quantitative traits. Gene symbols represent the closest genes to the associated loci and may not be the actual causal gene.
Beyond GWAS studies:Given this circumstance, there is a pressing need to develop and assess strategies to identify the culprit genes and demonstrate their downstream effects. One approach is described here and methods of dissecting these associated loci for true cause and effect are more elaborately discussed.
In an effort to identify the specific genes responsible for T2D risk, we needed to understand the epigenomic landscape of non-coding DNA. Thus we chose to construct reference maps of chromatin structure based on a set of histone modifications that are well understood to correlate with function (Ernst et al., 2011) -- predicting promoters, enhancers, and repressed chromatin in T2D relevant tissues by chromatin immunoprecipitation (ChIP) experiments performed in pancreatic islets. Integrating T2D associated loci with these regulatory reference maps, as well as gene expression by whole transcriptome sequencing, we aim to identify the causal genes applying this functional strategy.
21
22
Chapter 1
The PPAR- 2 Pro12Ala variant: Association with type 2 diabetes and trait differences
Diabetes2001; 50(4): 886-890
23
24
P
The Peroxisome Poliferator–Activated Receptor-γ2 Pro12Ala Variant Association With Type 2 Diabetes and Trait Differences Michael R. Erdos,2 Julie A. Douglas,1 Richard M. Watanabe,3 Andi Braun,4 Cristy L. Johnston,4
Paul Oeth,4 Karen L. Mohlke,2 Timo T. Valle,5 Christian Ehnholm,5 Thomas A. Buchanan,6
Richard N. Bergman,7 Francis S. Collins,2 Michael Boehnke,1 and Jaakko Tuomilehto5,8
Recent studies have identified a common proline-to- alanine substitution (Pro12Ala) in the peroxisome pro- liferator–activated receptor-y2 (PPAR-y2), a nuclear receptor that regulates adipocyte differentiation and possibly insulin sensitivity. The Pro12Ala variant has been associated in some studies with diabetes-related traits and/or protection against type 2 diabetes. We examined this variant in 935 Finnish subjects, including 522 subjects with type 2 diabetes, 193 nondiabetic spouses, and 220 elderly nondiabetic control subjects. The frequency of the Pro12Ala variant was significantly lower in diabetic subjects than in nondiabetic subjects (0.15 vs. 0.21; P = 0.001). We also compared diabetes- related traits between subjects with and without the Pro12Ala variant within subgroups. Among diabetic sub- jects, the variant was associated with greater weight gain after age 20 years (P = 0.023) and lower triglyceride levels (P = 0.033). Diastolic blood pressure was higher in grossly obese (BMI >40 kg/m2) diabetic subjects with the variant. In nondiabetic spouses, the variant was associ- ated with higher fasting insulin (P = 0.033), systolic blood pressure (P = 0.021), and diastolic blood pressure (P = 0.045). These findings support a role for the PPAR-y2 Pro12Ala variant in the etiology of type 2 diabetes and the insulin resistance syndrome. Diabetes 50:886 – 890, 2001
eroxisome proliferator–activated receptors (PPARs) are members of the nuclear hormone receptor family of transcription factors and are involved in adipocyte differentiation and gene
expression. They are also believed to play an important role in type 2 diabetes and diabetes-related traits, includ- ing insulin sensitivity and lipid and energy metabolism (1). In fact, studies have shown that ligands for PPAR-)', including both endogenous ones and those that are syn- thetic (e.g., thiazolidinedione drugs), stimulate adipogen- esis and increase insulin action (2). A common proline-to- alanine substitution at codon 12 (Pro12Ala) of exon B has been inconsistently associated with protection against type 2 diabetes and diabetes-related traits (3–14). These findings encouraged us to investigate the role of PPAR-)'2 in our sample from the Finnish population. The objective of our study was to examine whether the Pro12Ala variant was associated with type 2 diabetes and to examine the relationship between the Pro12Ala variant and diabetes- related traits among subgroups of diabetic and nondia- betic subjects.
The mean and standard deviation of selected trait values are given in Table 1 by subgroup. The entire sample of 935
From the 1Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan; the 2Genetics and Molecular Biology Branch, National Human Genome Research Institute, Bethesda, Maryland; the 3Divi- sion of Biostatistics, Department of Preventative Medicine, Keck School of Medicine, University of Southern California, Los Angeles; 4Sequenom Inc., San Diego, California; the 5Department of Epidemiology and Health Promotion,
subjects consisted of 636 Pro/Pro subjects, 271 Pro/Ala
TABLE 1 Characteristics of the subjects by clinical subgroup
Diabetes and Genetic Epidemiology Unit, and the Department of Biochemis- try, National Public Health Institute, Helsinki, Finland; the 6Department of Diabetic
Spousal control
Elderly controlMedicine and the 7Department of Physiology and Biophysics, Keck School of
Medicine, University of Southern California, Los Angeles, California; and the 8Department of Public Health, University of Helsinki, Helsinki, Finland.
Address correspondence and reprint requests to Michael Boehnke, Univer- sity of Michigan, Department of Biostatistics, 1420 Washington Heights, Ann Arbor, Michigan 48109-2029. E-mail: [email protected].
Received for publication 12 June 2000 and accepted in revised form 29 December 2000.
J.A.D. and M.R.E. contributed equally to this work.Additional information can be found in an online appendix at www.
n 522 193 220 Sex (M:F) 288:234 62:131 106:114 Age at enrollment (years) 63.5 ± 7.5 61.4 ± 7.7 70.0 ± 0.3 Age at diagnosis (years) 50.0 ± 7.9 — — Diabetes duration (years) 13.6 ± 7.0 — — BMI (kg/m2) 30.0 ± 4.8 28.4 ± 4.5 27.0 ± 4.0 Waist-to-hip ratio 0.94 ± 0.08 0.88 ± 0.08 0.88 ± 0.08
diabetes.org/diabetes/appendix.asp. AIRG, acute insulin response to glucose; dBP, diastolic blood pressure; DI,
disposition index; FUSION, Finland–United States Investigation of Non– Insulin-Dependent Diabetes Mellitus Genetics; MALDI-TOF, matrix-assisted laser desorption/ionization time-of-flight; OGTT, oral glucose tolerance test;PPAR, peroxisome proliferator–activated receptor; PROBE, primer oligo base extension reaction; sBP, systolic blood pressure; SI, insulin sensitivity.
Fasting plasma glucose (mmol/l)
Fasting serum insulin (pmol/l)
Data are means ± SD.
10.7 ± 3.4 5.2 ± 0.7 5.0 ± 0.5
114.1 ± 71.4 75.6 ± 48.8 66.2 ± 34.8
25
PPAR-y2 Pro12Ala AND TYPE 2 DIABETES
Pro1
n frequency
subjects 522 — — control subjects 193 3.30 0.069 (mmHg) control subjects 220 11.72
or Unadjusted
Ala/Ala analysis analysis
subjects (362) (143) 0.052 0.023
(mmol/l) (373) (143) 0.047 0.033
(134) (84) 0.049 0.191
(pmol/l) (123) (68) 0.083 0.033 sBP† (mmHg) (122) (65) 0.014 0.021 dBP† (mmHg) (122) (65) 0.090 0.045
TABLE 2 Frequency of the PPAR-)'2 Pro12Ala variant by clinical subgroup
TABLE 4 dBP and sBP in diabetic subjects by presence/absence of the
2Ala variant and BMI
Pro/Pro Pro/Ala or
Ala/Ala
*Compared with diabetic subjects.
subjects, and 28 Ala/Ala subjects. The allele frequency of the Pro12Ala variant in the PPAR-)'2 gene was 0.15 among diabetic subjects, 0.19 among spousal control subjects, and 0.22 among elderly control subjects (Table 2), which mirrors the continuum of diabetes susceptibility across these subgroups. The frequency of the variant was signif- icantly lower in diabetic subjects than in elderly control subjects (x2 = 11.72, df = 1, P < 0.0007) and marginally lower than in spousal control subjects (P = 0.069). Com- parison with combined spousal and elderly control sub- jects gave a significant association result (x2 = 10.60, df = 1, P = 0.001). A second independent sample of 263 Finnish diabetic subjects in our study subsequently con- firmed the variant frequency of 0.15 in the original 522 diabetic subjects (data not shown). The observed geno- type data were consistent with Hardy-Weinberg equilib- rium.
Results for the quantitative traits were less compelling. Genotype-specific means for all traits for diabetic subjects, elderly control subjects, and spousal control subjects, respectively, are available in an online appendix (Tables A1–3) at www.diabetes.org/diabetes/appendix.asp. Table 3 shows the significant trait differences between subjects with and without the Pro12Ala variant by subgroup. In diabetic subjects, the presence of the variant was associ- ated with greater weight change after 20 years of age (22.2 ± 14.0 vs. 19.5 ± 13.0 kg) and lower serum triglyc- eride levels (2.29 ± 1.65 vs. 2.68 ± 2.21 mmol/l). Both results were significant after adjustment for sex, age, and (for triglyceride levels) BMI (P = 0.023 and 0.033, respec- tively). There was a significant interaction (P = 0.038) between the variant and BMI for diastolic blood pressure (dBP); the variant was associated with higher dBP only among grossly obese diabetic subjects (Table 4). A similar trend was also observed for systolic blood pressure (sBP),
TABLE 3
78.0 ± 7.5 (3) 82.5 ± 4.9 (2) 20 < BMI < 25 82.9 ± 9.8 (39) 79.5 ± 10.7 (21) 25 < BMI < 30 83.5 ± 10.5 (167) 83.8 ± 11.0 (47) 30 < BMI < 35 85.5 ± 9.6 (109) 84.5 ± 10.8 (47) 35 < BMI < 40 87.2 ± 10.7 (34) 90.7 ± 11.6 (16) BMI > 40 86.6 ± 9.9 (14) 94.4 ± 7.1 (7)
sBP* (mmHg) BMI < 20 157.3 ± 44.2 (3) 148.0 ± 18.4 (2) 20 < BMI < 25 148.5 ± 23.5 (39) 145.4 ± 22.8 (21) 25 < BMI < 30 150.9 ± 21.9 (167) 155.1 ± 21.7 (47) 30 < BMI < 35 151.3 ± 20.6 (109) 155.8 ± 22.5 (47) 35 < BMI < 40 151.0 ± 21.3 (34) 156.3 ± 24.5 (16) BMI > 40 149.0 ± 19.8 (14) 166.9 ± 25.8 (7)
Data are means ± SD (n). *Mean of two measurements.
although the interaction was not statistically significant (P = 0.299).
Among elderly control subjects, only maximum lifetime weight was significantly associated with the Pro12Ala variant (P = 0.049), and it was no longer significant after adjustment for sex (P = 0.191) (Table 3). Among nondia- betic spousal control subjects, the variant was signifi- cantly associated with higher sBP (151.4 ± 25.5 vs. 142.5 ± 18.9 mm Hg) (P = 0.014). When both sBP and dBP and fasting serum insulin were adjusted for sex, age, and BMI, differences between spousal control subjects with and without the Pro12Ala variant remained and/or became significant (P = 0.021, 0.045, and 0.033, respectively) (Table 3). All three traits were significantly higher among subjects with the variant.
In our analysis, we found a significantly lower frequency of the Pro12Ala variant of the PPAR-)'2 gene in diabetic subjects than in nondiabetic subjects. The directionality of these highly significant (P = 0.001) findings is consistent with results from the studies of Deeb et al. (3), Mancini et al. (4), and Altshuler et al. (14), although the difference in allele frequencies in the second study failed to reach statistical significance. Coupled with these studies and the biological importance of PPAR)', these findings suggest a
Significant results by clinical subgroup and presence/absence of the PPAR-)'2 Pro12Ala variant
Elderly control subjects
Spousal control subjects
Data are means ± SD (n) unless otherwise indicated. Adjusted analysis P value includes adjustment for sex, age, and (except for weight-related traits). BMI; P values are not adjusted for multiple comparisons. *Current weight minus weight at age 20 years; †mean of two measurements.
26
J.A. DOUGLAS AND ASSOCIATES
link between the Pro12Ala variant of the PPAR-)'2 gene and the pathogenesis of type 2 diabetes. The increased frequency of the variant in nondiabetic subjects would seem to suggest that the Pro12Ala variant confers some protective effect against diabetes.
Despite the increased frequency of the Pro12Ala variant among elderly control subjects, we failed to find any sig- nificant trait associations within this subgroup. Instead, we observed weak but significant associations between the Pro12Ala variant and traits characteristic of the insulin resistance syndrome in both diabetic and nondiabetic sub- jects. For example, greater weight gain was associated with the Pro12Ala variant in diabetic subjects, whereas higher fasting insulin, sBP, and dBP were associated with the variant in nondiabetic spouses. It should be emphasized that, in contrast to the spousal control subjects, the elderly control subjects represent a quite distinct subgroup of nondiabetic subjects who are unlikely to ever develop type 2 diabetes. As such, they are unlikely to carry the cluster of susceptibility genes that may interact with variants in PPAR-)'2 to result in the insulin resistance syndrome phenotype. The spousal control subjects are somewhat younger and remain at risk for developing type 2 diabetes during their lifetime.
Alterations in functional characteristics of the PPAR-)'2 gene induced by the Ala isoform may be partly responsible for the manifestation of some characteristics of the insulin resistance syndrome. Deeb et al. (3) identified lowered transactivation capacity and reduced stimulation of PPAR-)' target genes as a potential molecular mechanism underly- ing the association of the Pro12Ala variant with lower BMI and increased insulin sensitivity, a hypothesis consistent with their observations in Finnish subjects. Although this hypothesis may appear to be at odds with (or at least not supported by) our trait findings, several points should be clarified. First, the middle-aged subjects in the study by Deeb et al. (3) were much younger and leaner than our nondiabetic spouses and elderly control subjects. Second, although the elderly subjects from both studies were bet- ter matched, we could not parallel their genotype-based analysis because of insufficient numbers of Pro12Ala ho- mozygotes. If the Pro/Ala and Ala/Ala subjects from their study had been pooled, it is unlikely that they would have observed significant trait differences because trends with- in their elderly subjects were inconsistent (e.g., fasting insulin was highest for Pro/Ala heterozygotes).
Consistent with at least one report of a differential effect of the PPAR-)'2 Pro12Ala variant in the lean and obese states (12), we also found an interaction between BMI and the variant for dBP in the diabetic subjects. Among se-verely obese subjects, those with the Pro12Ala variant had substantially higher blood pressure. Higher values for sBP and dBP were also associated with the variant among non- diabetic spousal control subjects, though there was no evidence for an interaction between BMI and the Pro12Ala variant. These associations are of interest, given the recent report by Barroso et al. (13) of three type 2 diabetic sub- jects with early-onset hypertension and polymorphisms in the PPAR-)'2 gene, suggesting that this receptor is impor- tant in both blood pressure and glucose homeostasis.
In summary, we found that the Pro12Ala variant of the PPAR-)'2 gene was associated with protection against type 2 diabetes in Finnish subjects, a finding consistent with several reports in the literature (3,4,14). Because we only screened for this particular variant, we cannot exclude the role of other PPAR-)'2 variants or variants in nearby genes, possibly in linkage disequilibrium with the Pro12Ala vari- ant. Further studies, including functional analyses, will be required to fully understand the role of this gene in type 2 diabetes. Our data suggest that the PPAR-)'2 Pro12Ala vari- ant has variable effects among subgroups of individuals with different levels of diabetes risk.
RESEARCH DESIGN AND METHODS The Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics (FUSION) Study is an international collaborative effort to map and clone genes predisposing to type 2 diabetes and related traits in Finnish subjects. The FUSION study design and family material have been described previously (15). For the present investigation, our sample included 522 unrelated subjects with type 2 diabetes, 193 nondiabetic spouses of a diabetic subject or his/her affected sibling, and 220 unrelated elderly nondia- betic control subjects. Diabetes was diagnosed by World Health Organization (16) criteria. Spouses had a single normal oral glucose tolerance test (OGTT). Elderly control subjects had normal glucose tolerance at ages 65 and 70 years.
A total of 14 traits were analyzed on all subjects: BMI, waist circumference, waist-to-hip ratio, current weight, maximum lifetime weight, fasting plasma glucose, fasting serum insulin, total cholesterol, HDL cholesterol, HDL ratio (HDL cholesterol/total cholesterol), LDL cholesterol, triglycerides, sBP, and dBP. Values for sBP and dBP were each determined as the mean of two measurements. Seven additional traits were ascertained on diabetic subjects: weight at 20 years of age, change in weight after 20 years of age, maximum lifetime weight change after 20 years of age, age at diagnosis of diabetes, diabetes duration, age at which insulin treatment started (if applicable), and fasting plasma C-peptide concentrations. In addition, glucose and insulin concentrations 2 h after OGTT were analyzed in nondiabetic subjects, whereas the insulin sensitivity index (SI), the glucose effectiveness index, the acute insulin response to glucose (AIRG), and the disposition index (DI) were analyzed (DI = SI X AIRG) only in the nondiabetic spouses; the latter analyses used tolbutamide-modified frequently sampled intravenous glucose tolerance tests and minimal model analysis (17). Glucose, insulin, C-peptide, and lipid concentrations were assayed using standard methods (15). Genotyping by matrix-assisted laser desorption/ionization time-of- flight mass spectrometry. The PPAR-)'2 Pro12Ala variant was analyzed by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. A 69-bp fragment containing the Pro12Ala variant site was amplified by polymerase chain reaction from 20 ng genomic DNA using 25 pmol forward primer 5'-GCTGTTATGGGTGAAACTCTG, 2 pmol of a universal sequence-tailed reverse primer 5'-AGCGGATAACAATTTCACACAGGCAGTG- TATCAGTGAAGGAATCG, and 10 pmol of a biotinylated universal primer 5'-biotin-AGCGGATAACAATTTCACACAGG under standard reaction condi- tions (Fig. 1A). After 15 min of denaturation at 95°C, 55 cycles (5 s at 95°C, 20 s at 53°C, and 30 s at 72°C) were performed. To recover the single-stranded DNA template, the product was immobilized on streptavidin-coated magnetic beads (Dynal, Great Neck, NY), washed with 10 mmol/l Tris-HCl at pH 8.0, denatured in 50 J.l 0.1 mol/l NaOH, and washed again with 10 mmol/l Tris-HCl.
The primer oligo base extension reaction (PROBE) was performed by the addition of 20 pmol extension primer 5'-TCTGGGAGATTCTCCTATTGAC under conditions similar to those previously described (18). The extension reaction products were applied to a SpectroChip (Sequenom, San Diego, CA) prespotted with a matrix of 3-hydroxypicolinic acid using a Spectrojet piezoelectric nanoliter dispensing system (19). A modified Bruker Biflex III MALDI-TOF mass spectrometer (DNA MassArray; Sequenom) was used to determine genotypes by the appearance of peaks corresponding to the expected extension product masses (Fig. 1B). Statistical analyses. Associations of the Pro12Ala variant of the PPAR-)'2 gene between diabetic subjects and both nondiabetic spouses and elderly control subjects were examined by x2 tests of independence. Trait differences within diabetic, elderly control, or spousal control subgroups were examined by analysis of variance. Initially, we tested whether trait means differed significantly among subjects with the Pro/Pro, Pro/Ala, and Ala/Ala genotypes. Due to the small number of individuals with the Ala/Ala genotype, we subsequently tested whether the trait means differed between subjects with and without the Pro12Ala variant (Pro/Ala and Ala/Ala versus Pro/Pro). All
27
PPAR-y2 Pro12Ala AND TYPE 2 DIABETES
FIG. 1. Genotype analysis by MALDI-TOF spectrometry. A: PROBE reaction. The region containing the PPARy2 Pro12Ala (CCA->GCA) variant is amplified with a biotinylated primer to enable purification of the single-stranded template. Next, the PROBE primer anneals to the template and is extended. When the single nucleotide polymorphism (SNP) is C, the probe is extended by one nucleotide, dideoxy-CTP. When the SNP is G, the probe is extended by two nucleotides, deoxy-GTP, and dideoxy-CTP. B: Mass spectrometry profiles of primer extension products. Peaks at 6,989.6 and 7,318.8 Da correspond to the mass of the probe primer extended by one or two nucleotides, respectively. Genotypes of the spectra are 1) CC, 2) CG, and 3) GG. The mass of the unextended PROBE primer is indicated at 6,716.4 Da, but in these examples, none is detected.
analyses were performed with and without adjustment for covariates, includ- ing sex, age, and BMI. Preselected interactions between the variant and sex or BMI were also tested. Standard regression diagnostics were computed to examine the adequacy of model assumptions, and traits were transformed to approximate normality when necessary. P values <0.05 were considered statistically significant. No adjustments for multiple comparisons were made. We excluded from the analyses any subject who, on the day of their examinations, took medications that could influence the trait of interest. We also excluded subjects whose diabetic status was uncertain and those with a first-degree relative with type 1 diabetes.
ACKNOWLEDGMENTS The FUSION study is made possible by intramural funds from the National Human Genome Research Institute (Project number OH95-C-N030), by grants from the Finn-ish Academy (38387 and 46558), and by National Insti- tiutes of Health grants HG00040 (J.A.D.), HG00376 (M.B.), DK09525 (R.M.W.), DK27619, and DK29867 (R.N.B.). Cur- rently, J.A.D. is supported by a University of Michigan Rackham Predoctoral Fellowship, and R.M.W. is supported by a Career Development Award from the American Diabetes Association.
We wish to thank all of the subjects for their invaluable contribution to the FUSION study. We also gratefully acknowledge Peter Chines for his exceptional work in pre- paring the data. Family studies were approved by institu- tional review boards at the National Institutes of Health (assurance number SPA S-5737-05) and at the National Public Health Institute in Helsinki, Finland.
REFERENCES 1. Auwerx J: PPAR)', the ultimate thrifty gene. Diabetologia 42:1033–1049,
1999
2. Spiegelman BM: PPAR-)': adipogenic regulator and thiazolidinedione re-ceptor (Review). Diabetes 47:507–514, 1998
3. Deeb SS, Fajas L, Nemoto M, Pihlajamaki J, Mykkanen L, Kuusisto J, Laakso M, Fujimoto W, Auwerx J: A Pro12Ala substitution in PPAR)'2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nat Genet 20:284 –287, 1998
4. Mancini FP, Vaccaro O, Sabatino L, Tufano A, Rivellese AA, Riccardi G, Colantuoni V: Pro12Ala substitution in the peroxisome proliferator-acti- vated receptor-)'2 is not associated with type 2 diabetes. Diabetes 48:1466 – 1468, 1999
5. Ringel J, Engeli S, Distler A, Sharma AM: Pro12Ala missense mutation of the peroxisome proliferator activated receptor )' and diabetes mellitus. Biochem Biophys Res Commun 254:450 – 453, 1999
6. Clement K, Hercberg S, Passinge B, Galan P, Varroud-Vial M, Shuldiner AR, Beamer BA, Charpentier G, Guy-Grand B, Froguel P, Vaisse C: The Pro115Gln and Pro12Ala PPAR )' gene mutations in obesity and type 2 diabetes. Int J Obes 24:391–393, 2000
7. Meirhaeghe A, Fajas L, Helbecque N, Cottel D, Auwerx J, Deeb SS, Amouyel P: Impact of the peroxisome proliferator activated receptor )'2 Pro12Ala polymorphism on adiposity, lipids, and non-insulin-dependent diabetes mellitus. Int J Obes 24:195–199, 2000
8. Ek J, Urhammer SA, Sorensen TI, Andersen T, Auwerx J, Pedersen O: Homozygosity of the Pro12Ala variant of the peroxisome proliferation- activated receptor-)'2 (PPAR-)'2): divergent modulating effects on body mass index in obese and lean Caucasian men. Diabetologia 42:892– 895, 1999
9. Beamer BA, Yen CJ, Andersen RE, Muller D, Elahi D, Cheskin LJ, Andres R, Roth J, Shuldiner AR: Association of the Pro12Ala variant in the peroxisome proliferator–activated receptor-)'2 gene with obesity in two Caucasian populations. Diabetes 47:1806 –1808, 1998
10. Koch M, Rett K, Maerker E, Volk A, Haist K, Deninger M, Renn W, Haring HU: The PPAR)'2 amino acid polymorphism Pro 12 Ala is prevalent in offspring of type II diabetic patients and is associated to increased insulin sensitivity in a subgroup of obese subjects. Diabetologia 42:758 –762, 1999
11. Cole SA, Mitchell BD, Hseuh W, Pineda P, Beamer BA, Shuldiner AR, Comuzzie AG, Blangero J, Hixson JE: The Pro12Ala variant in peroxisome proliferator-activated receptor-)'2 (PPAR-)'2) is associated with measures of obesity in Mexican Americans. Int J Obes 24:522–524, 2000
28
J.A. DOUGLAS AND ASSOCIATES
12. Ristow M, Muller-Wieland D, Pfeiffer A, Krone W, Kahn CR: Obesity associated with a mutation in a genetic regulator of adipocyte differenti- ation. N Engl J Med 339:953–959, 1998
13. Barroso I, Gurnell M, Crowley VE, Agostini M, Schwabe JW, Soos MA, Maslen GL, Williams TD, Lewis H, Schafer AJ, Chatterjee VK, O’Rahilly S: Dominant negative mutations in human PPAR)' associated with severe insulin resistance, diabetes mellitus and hypertension. Nature 402:880 – 883, 1999
14. Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl M, Nemesh J, Lane CR, Schaffner SF, Bolk S, Brewer C, Tuomi T, Gaudet D, Hudson TJ, Daly M, Groop L, Lander ES: The common PPAR)' Pro12Ala polymor- phism is associated with decreased risk of type 2 diabetes. Nat Genet 26:76 – 80, 2000
15. Valle T, Tuomilehto J, Bergman RN, Ghosh S, Hauser ER, Eriksson J, Nylund SJ, Kohtamaki K, Toivanen L, Vidgren G, Tuomilehto-Wolf E,
Ehnholm C, Blaschak J, Langefeld CD, Watanabe RM, Magnuson V, Ally DS, Hagopian WA, Ross E, Buchanan TA, Collins F, Boehnke M: Mapping genes for NIDDM: design of the Finland-United States Investigation of NIDDM (FUSION) Genetics Study. Diabetes Care 21:949 –958, 1998
16. World Health Organization: Diabetes Mellitus: Report of a WHO Study Group. Geneva, World Health Org., 1985 (Tech. Rep. Ser. no. 727)
17. Bergman RN: Lilly Lecture 1989: Toward physiological understanding of glucose tolerance: minimal-model approach. Diabetes 38:1512–1527, 1989
18. Braun A, Little DP, Ko ster H: Detecting CFTR gene mutations by using primer oligo base extension and mass spectrometry. Clin Chem 43:1151– 1158, 1997
19. Little DP, Cornish TJ, O’Donnell MJ, Braun A, Cotter RJ, Ko ster H: MALDI on a chip: analysis of arrays of low-femtomole to subfemtomole quantities of synthetic oligonucleotides and DNA diagnostic products dispensed by a piezoelectric pipette. Anal Chem 69:4540 – 4546, 1997
29
30
Chapter 2
High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools
Proceedings of the National Academy of Science, USA. 2002;99(26):16928-33
31
32
A
High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools Michael R. Erdos*†, Karen L. Mohlke*†, Laura J. Scott†‡, Tasha E. Fingerlin§, Anne U. Jackson‡, Kaisa Silander*, Pablo Hollstein*, Michael Boehnke‡¶, and Francis S. Collins*¶
*Genome Technology Branch, National Human Genome Research Institute, Bethesda, MD 20892; and Departments of ‡Biostatistics and §Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109
Contributed by Francis S. Collins, October 31, 2002
To facilitate positional cloning of complex trait susceptibility loci, we are investigating methods to reduce the effort required to identify trait-associated alleles. We examined primer extension analysis by matrix-assisted laser desorption/ionization time-of- flight mass spectrometry to screen single-nucleotide polymor- phisms (SNPs) for association by using DNA pools. We tested whether this method can accurately estimate allele frequency differences between pools while maintaining the high-throughput nature of assay design, sample handling, and scoring. We follow up interesting allele frequency differences in pools by genotyping individuals. We tested DNA pools of 182, 228, and 499 individuals using 16 SNPs with minor allele frequencies 0.026 – 0.486 and allele frequency differences 0.001– 0.108 that we had genotyped previ- ously on individuals and 381 SNPs that we had not. Precision, as measured by the average standard deviation among 16 semi- dependent replicates, was 0.021 ± 0.011 for the 16 SNPs and 0.018 ± 0.008 for the 291/381 SNPs used in further analysis. For the 16 SNPs, the average absolute error in predicting allele frequency differences between pools was 0.009; the largest errors were 0.031, 0.028, and 0.027. We determined that compensating for unequal peak heights in heterozygotes improved precision of allele frequency estimates but had only a very minor effect on accuracy of allele frequency differences between pools. Based on these data and assuming pools of 500 individuals, we conclude that at sig- nificance level 0.05 we would have 95% (82%) power to detect population allele frequency differences of 0.07 for control allele frequencies of 0.10 (0.50).
ssociation studies provide a powerful approach to identify the DNA variants underlying complex traits (1). Currently,
association studies can be especially useful for narrowing a complex trait candidate inter val identified by linkage analysis (2, 3), although improved genotyping technology and a map of single-nucleotide polymorphisms (SNPs) identifying the com- mon haplotypes in the human genome may enable association studies of loci spanning the entire genome. A rate-limiting step for association studies is to obtain the large number of genotypes needed. Currently, a linkage region expected to contain a complex trait locus typically spans 10 –20 Mb, and even with a priori knowledge of the linkage disequilibrium between DNA variants, thousands of densely spaced SNPs with a range of allele frequencies may need to be screened (4). In addition, sample sizes of hundreds or even thousands of individuals may be required to have sufficient power to detect loci with modest effect.
A reliable screening method to identify SNPs associated with disease without genotyping all individuals would be efficient and economical. Screening SNPs by typing a limited number of DNA pools representing cases and controls in principle requires vastly fewer genotypes for each SNP, reducing labor and reagent costs. Genotyping cost becomes essentially independent of sample size, allowing larger, more powerful samples to be studied. In addi- tion, the amount of DNA used from each person for each
genotype can be dramatically reduced, an important consider- ation when DNA samples are limited.
An optimal technique to screen SNPs for association would accurately and precisely identify SNPs that show a difference between cases and controls. Because the major experimental question is not the absolute allele frequencies, but whether there are allele frequency differences between cases and controls, a consistent under- or overestimate of pooled allele frequencies, if modest or correctable, would not preclude a method from use. Several methods for typing SNPs in pooled DNA, including mass spectrometr y, have been described (5–21). These methods cur- rently have var ying suitability to a high-throughput setting. For many of these methods, the precision and accuracy in estimating allele frequency differences between pools remain to be estab- lished, as does the variability associated with pool formation and each stage of the genotyping process.
Primer extension analysis by mass spectrometr y is a potentially attractive method for allele frequency estimation based on pools because it can be easily automated. Design of assays based only on local sequence allows automated assay design with uniform assay conditions. This similarity of assay conditions permits extensive use of robotics, which limits human error. Mass spectrometr y data collection is fast and automated, based on the size of extended products.
The precision of mass spectrometr y has been evaluated in a limited number of studies (19 –21). Ross and coworkers (19) tested the quantitative range and detection limits of the tech- nique and were able to quantitate allele frequencies as low as 0.05. Buetow et al. (20) used 81 assays to evaluate precision; when each primer extension reaction was dispensed four times or when each PCR was repeated four times, they obser ved a median standard deviation (SD) of 0.016 or 0.017, respectively. Werner et al. (21) obser ved a median SD of 0.017 in artificial pools and 0.016 – 0.024 for estimates from pools of 94 –280 individuals.
We have extended the work of previous studies by assessing the ability of mass spectrometr y to reliably estimate allele frequencies in pools and allele frequency differences between pools and by estimating the sources of variability in these estimates. We performed primer extension assays and used SPECTROTYPER software (Sequenom, San Diego) to quantitate allele frequency estimates from relative peak areas. We com- pared estimated allele frequencies and allele frequency differ- ences to those obtained from typing individual DNA samples for 16 SNPs in three DNA pools of laborator y interest. We also assessed precision in allele frequency estimates for 381 addi- tional SNPs assayed only in pools. We used the estimates of the variability from PCR and primer extension, and product dis- pensing and mass spectrometr y to estimate the power of pooled
Abbreviation: SNP, single-nucleotide polymorphism. †K.L.M., M.R.E., and L.J.S. contributed equally to this work. ¶To whom correspondence may be addressed. E-mail: [email protected] or [email protected].
33
genotyping and to compare its power with that for genotyping individual samples. The data demonstrate that this method has the necessar y characteristics to be used successfully for pooled genotype analysis.
Methods Study Samples. The DNA samples used are from participants in the Finland-United States Investigation of Non-Insulin Depen- dent Diabetes Mellitus Genetics (FUSION) Study, in which we seek to identify genetic variants that predispose to type 2 diabetes or are responsible for variability in diabetes-related quantitative traits. Families were enrolled based on sibling pairs affected with type 2 diabetes (22); controls included 194 nondiabetic spouses of affected family members and 231 unre- lated elderly controls. Informed consent was obtained from all participants.
Construction of Case and Control DNA Pools. We selected samples to create one DNA pool representing cases with type 2 diabetes and two pools representing controls. We selected one affected individual from each of 525 families for a pool designated F1, 194 unaffected spouses for a pool designated SP, and 231 unrelated elderly nondiabetic controls for a pool designated EC. Based on an initial quantitation by spectrophotometer (Beckman DU-640), each sample was diluted to an expected concentration of =50 ng/11l and requantitated by using a PicoGreen assay (Molecular Probes) on a f luorometer (Molecular Devices Spec- traMAXGeminiXS). Four independent measurements were performed by using the low range standard protocol and the concentrations were averaged. If the independent measurements varied from the mean by >10%, the measurement was repeated. Samples that were determined to have less than the required amount of DNA for each pool were omitted. Based on the concentrations of individual samples, we calculated the volumes needed to obtain equimolar amounts of each sample. We combined samples to create subpools of =100 individuals and adjusted the concentration of each subpool to 50 ng/11l by using the same criteria for quantitation as the individual samples. The appropriate subpools were combined and diluted to 10 ng/11l before use. The final pool sizes were 499 individuals for F1, 182 for SP and 228 for EC.
PCR, Primer Extension Reactions, and Mass Spectrometry. Most PCR primers and primer extension assays were designed by using SPECTRODESIGN software (Sequenom) specifying an optimal PCR product of 100 nucleotides with a range of 60 – 400. SNP assays were designed to generate extension products of different masses, usually by incorporating one dideoxynucleotide or one deoxynucleotide and one dideoxynucleotide, depending on the SNP allele. Primer sequences for the 16 SNPs typed on pools and individuals are available in Table 3, which is published as supporting information on the PNAS web site, www.pnas.org. A set of 381 additional SNPs were typed in pools and on 7–11 individuals as part of a large-scale SNP screening project. Assay designs were uploaded w ith SPECTROIMPORTER software (Sequenom). We used 20 ng of genomic DNA as template in 20-11l PCR, all of which was used for a magnetic-bead based isolation of template before performing primer extension reac- tions using standard conditions as described (23). We used a Spectrojet piezoelectric nanoliter dispensing system (Sequenom) to apply the extension products onto chips prespotted with a matrix of 3-hydroxypicolinic acid (24) and a modified Bruker Bif lex III matrix-assisted laser desorption/ionization time-of- f light (MA LDI-TOF) mass spectrometer (Sequenom) to deter- mine genotypes by the appearance of peaks corresponding to the expected extension product masses. To minimize variability caused by depurination of extension product peaks, we scanned chips within 24 h after dispensing extension products, although
we do not know whether depurination would be unequal be- tween pools and introduce variability.
To genotype individuals for the 16 SNPs, we dispensed primer extension products one time each and set mass spectrometr y SPECTROACQUIRE software (Sequenom) to collect sets of 20 spectra until a genotype could be called unambiguously or five sets of 20 spectra were collected, whichever came first. The 16 SNPs were on average 97% successful (range 94 –98%) on the 909 individuals comprising the pools. We routinely performed a limited manual review of spectra to detect and remove ques- tionable individual genotype calls, usually calls with low signal intensity. We genotyped 4 of ever y 90 samples in duplicate. We have obser ved an error rate among duplicates of 0.03%.
To genotype pools, we performed four replicate PCRs for each SNP on each pool and dispensed primer extension products onto four spots of a 384-spot chip, yielding a total of 16 obser vations (four PCRs X four spots per PCR) for each pool for each SNP. We set the mass spectrometr y SPECTROACQUIRE software to collect five sets of 20 spectra and raster to all positions. We obtained peak areas from SPECTROTYPER software by integration of the area under the spectral peak at the expected mass of the extension product.
Review of SNP Assays Tested for Association by Using DNA Pools. When we tested the 381 novel SNPs on DNA pools, we applied the following criteria to remove poor quality data. We removed spectra with signal-to-noise ratios below 3.5 or with a peak height below 1.0 intensity unit. We removed SNPs for which less than 8 of 16 possible obser vations remained for any pool or for which the SD of any pool was greater than 0.05. At the same time that we determined SNP genotypes in DNA pools, we genotyped one negative control sample and 7–11 individual samples to help detect assay artifacts. For each individual, we performed a single PCR and dispensed the extension product onto four spots on a chip, yielding a total of four obser vations per individual. To mimic a high-throughput procedure, we did not select individuals by prior knowledge of genotypes. We discarded SNPs from further analysis if all individuals were heterozygotes, although we recognized that as many as one reliable SNP assay in 2n, where n is the number of individuals successfully tested, may show all heterozygotes by chance. We also discarded SNPs in which heterozygotes showed widely skewed peak ratios (peak area of one allele at least four times greater than peak area of the other allele), because our experience, as well as that of others (25, 26), suggests that these SNPs are difficult to score correctly. Finally, we discarded SNPs for which obser ved allele frequencies in heterozygotes differed dramatically from one another (SD > 0.10), because we have found such assays often fail tests of Hardy–Weinberg equilibrium.
Of the original 381 SNPs, 90 (23.6%) failed to meet one or more of the above criteria. A total of 58 (15.2%) had a pool with <8 successful obser vations, 11 (2.9%) had a pool with allele frequency SD >0.05, 17 (4.5%) had all heterozygous individuals, 11 (2.9%) had severely skewed average heterozygous peak ratios, and 22 (5.8%) had heterozygotes with dramatically different peak ratios.
Statistical Analysis. Given four PCRs and four spots per PCR, up to 16 obser vations were available to estimate the allele frequency for each SNP in each pool. For each of these obser vations, we initially used the pool peak areas A and B of the lower- and higher-mass alleles, respectively, to obtain the pool-based allele frequency estimate p = A/(A + B). As an alternative, we adjusted the estimate to take into account the unequal peak area of the two alleles in heterozygotes. To do so, we calculated the sample mean k of the ratios a/b, where a and b represent peak areas of the lower- and higher-mass alleles for an individual; we calculated k over all measurements on the indiv iduals
34
heterozygous for the SNP. The resulting allele frequency esti- mate for each of the up to 16 pool-based obser vations was p = A/(A + kB) (7). For either of these estimation methods, we then calculated the overall allele frequency estimate as the average of the up to 16 obser vation-specific estimates. For the 25 (8.6%) of 291 SNPs without data on individual heterozygotes, we used p.
To test for allele frequency differences between cases and controls based on our pooled results, we estimated the difference in allele frequencies between case and control pools, and compared this difference to its standard error by using the statistic T = (p1 - p2)/[Var(p1 - p2)]1/2. Here, pi is the mean estimated allele frequency in group i (1 = case, 2 = control) and Var represents variance.
To estimate Var(p1 - p2), we note that this variance ref lects the combined effects of population sampling and measurement error caused by carr ying out allele frequency estimation on pools, or Var(p1 - p2) = asampling
2 + ameasurement2. We estimated
the sampling variance by ssampling2 = p12(1 - p12)/[1/(2n1) +
1/(2n2)], where p12 = (n1p1 + n2p2)/(n1 + n2) is the weighted average of the case and control allele frequency estimates and ni is the number of individuals in pool i.
We modeled the measurement error caused by allele fre- quency estimation based on pools as ameasurement
2 = apcr2 +
aspot2. Here, apcr
2 and aspot2 are variances caused by PCR and
primer extension, and sample dispensing and mass spectrometr y analysis, respectively. We estimated apcr
2 and aspot2 for each SNP
with a mixed effects analysis of variance by using the MIXED procedure in SAS (SAS Institute, Car y, NC). In this analysis, allele frequency estimate was the response variable, indicators for each pool were included as fixed effects, and PCR was included as a random effect nested within pool. By specifying this model, we implicitly assume the absence of variability caused by pool construction. Because we did not construct multiple pools for each sample, we could not estimate this variability directly. Subsequent data analysis suggests this variability is modest and that assuming its absence has not significantly adversely affected our test (see Results).
Given npcr,i PCRs and nspot,i spots for group i = 1, 2 (in the absence of missing data, npcr,i = 4 and nspot,i = 16), replicate measurements result in an overall variance estimate of Var(p1 - p2) = ssampling
2 + spcr2 (1/npcr,1 + 1/npcr,2) + sspot
2 (1/nspot,1 + 1/nspot,2).
We estimated the false positive rate and power to detect significant allele frequency differences between pools by com- puter simulation. Each simulated pool contained 200 or 500 individuals, had control allele frequencies of 0.10, 0.50, or 0.80, and had case-control allele frequency differences of 0.00, 0.05, 0.07, or 0.10. For each replicate, we simulated obser vations for case and control pools with four PCRs per pool and four spots per PCR and for a single heterozygote with one PCR and four spots per PCR. For each set of simulation replicates, the heterozygous individuals were assigned a mean k value of 1.00, 1.29, 1.50, 2.40, or 4.00, and a SD for k of 0.11, as we obser ved in our data. We assumed PCR and spot variability were absent (corresponding to individual genotyping) or were equal to their estimated values of 1.18 X 10-4 and 3.82 X 10-4, respectively, as obser ved in our data.
Results To assess whether the SNP genotyping method of primer exten- sion–mass spectrometr y was sufficiently accurate and precise to detect modest allele frequency differences between pools, we tested 16 SNPs with individual genotypes previously determined as part of our diabetes research project. These SNPs were selected to have a range of minor allele frequencies and fre- quency differences between cases of type 2 diabetes, unaffected spouse controls, and unaffected elderly controls. The frequency differences of 0.001– 0.108 are modest but ref lect our intention
Fig. 1. Sample spectra and frequency estimates based on the peak area. Frequency estimates of the C allele are 0.424 and 0.355 in the control and case pools, respectively, showing a difference of 0.069. Given the C allele frequency is overestimated in the heterozygote as 0.570, allele frequencies in the pools can be adjusted to 0.357 and 0.293, respectively. True frequencies of the C allele based on genotyping of the individuals comprising the pools are 0.377 and 0.313 for the controls and cases, respectively, so the estimate of allele frequency difference from the pool analysis is very accurate. In practice, we estimate pooled allele frequencies and the heterozygote ratio from mul- tiple replicate observations, rather than from the single observations used here for purposes of illustration.
to use pooling to scan for association in complex diseases, where allele frequency differences are not expected to be dramatic. The 16 assays were not individually optimized, although they were chosen from a set of assays that had been successfully typed on >94% of individuals comprising the pools. We tested each DNA pool with quadruplicate PCR and extension reactions, each of which we dispensed and scanned four times for a total of up to 16 frequency estimates per SNP-pool combination. Over the course of our initial studies, we obser ved that increased peak intensity and signal-to-noise ratio decreased SDs between rep- licates (data not shown); for this analysis, we dispensed sample twice onto each spot before scanning. Example spectra are shown in Fig. 1. We obser ved unequal allele intensity in het- erozygous individuals, a characteristic that has been described (25) and that we have obser ved for individual heterozygous samples with most of the hundreds of SNPs that we have typed on individual samples.
We calculated allele frequency estimates both with (p) and without (p) adjustment for unequal peak heights in heterozy- gotes, and compared the accuracy with which these two pool- based methods estimated allele frequencies. The average het- erozygote ratio k = a/b for the 16 SNPs was 1.19 + 0.18, whereas the average SD of k was 0.12 + 0.05. The absolute average difference between pool-based and individual-based allele fre- quency estimates was 0.033 + 0.021 (range 0.001– 0.083) for p and 0.014 + 0.010 (range 0.000 – 0.037) for p, suggesting that adjustment resulted in more accurate allele frequency estimates. We use the heterozygote-adjusted allele frequency data in what follows unless otherwise noted. Table 1 shows the minor allele frequency estimates for 16 SNPs in three pools as well as the corresponding estimates obtained from individual genotypes. The average allele frequency SD we obser ved for up to 16 replicate values from 48 SNP-pool combinations was 0.021 + 0.011, and the maximum SDs were 0.073, 0.049, and 0.035.
We compared the SD from the 16 SNPs to a larger number of SNPs that were not typed on the individuals comprising the pools. For the 291 additional SNPs that met our criteria for analysis (see Methods), we obser ved an average SD from the 873
35
Table 1. Frequencies of SNPs as estimated by genotyping DNA pools and individual samples
Cases (F1) Spouses (SP) Elderly controls (EC) Prediction error
SNP Indiv Pool Indiv Pool Indiv Pool F1–SP F1–EC SP–EC
GLUT10_14 0.035 0.028 + 0.009 0.036 0.026 + 0.009 0.026 0.015 + 0.010 0.003 0.005 0.001 GLUT10_1 0.057 0.046 + 0.011 0.063 0.043 + 0.018 0.078 0.060 + 0.012 0.009 0.007 0.001 SNP63 0.118 0.125 + 0.013 0.120 0.125 + 0.011 0.115 0.119 + 0.010 0.002 0.004 0.002 PPARg2 0.145 0.182 + 0.016 0.194 0.230 + 0.017 0.224 0.252 + 0.023 0.001 0.008 0.007 ss146316 0.146 0.128 + 0.019 0.135 0.103 + 0.022 0.095 0.067 + 0.020 0.014 0.010 0.004 ss121557 0.156 0.140 + 0.014 0.141 0.115 + 0.009 0.115 0.089 + 0.015 0.009 0.009 0.000 ss146317 0.176 0.165 + 0.024 0.146 0.144 + 0.012 0.130 0.118 + 0.022 0.009 0.001 0.010 ss93115 0.236 0.251 + 0.012 0.251 0.249 + 0.021 0.312 0.317 + 0.032 0.017 0.009 0.008 SNP43 0.257 0.286 + 0.032 0.259 0.274 + 0.049 0.246 0.246 + 0.073 0.014 0.028 0.015 ss64248 0.309 0.298 + 0.023 0.316 0.311 + 0.024 0.312 0.317 + 0.027 0.007 0.016 0.010 ss1304220 0.313 0.318 + 0.021 0.379 0.399 + 0.020 0.377 0.395 + 0.021 0.015 0.013 0.002 ss121556 0.382 0.381 + 0.026 0.409 0.405 + 0.021 0.462 0.448 + 0.021 0.003 0.013 0.010 ss148393 0.429 0.428 + 0.010 0.392 0.389 + 0.012 0.348 0.352 + 0.013 0.002 0.005 0.007 ss86782 0.433 0.423 + 0.034 0.442 0.459 + 0.028 0.443 0.429 + 0.027 0.027 0.004 0.031 SNP56 0.438 0.456 + 0.016 0.428 0.446 + 0.016 0.415 0.437 + 0.021 0.000 0.004 0.004 ss86876 0.486 0.488 + 0.035 0.428 0.404 + 0.029 0.378 0.361 + 0.028 0.026 0.019 0.007
F1, cases of type 2 diabetes; SP, unaffected spouses; EC, elderly nondiabetic controls; Indiv, individuals. Frequencies for pools are mean + SD. Prediction error is the absolute difference of the frequency estimates based on pools compared to individual genotypes.
SNP-pool combinations of 0.018 + 0.008. The average hetero- zygote ratio k in the sample of 266 of 291 SNPs with at least one heterozygous individual was 1.29 + 0.39, whereas the average SD of k was 0.11 + 0.07.
For the 16 SNPs, we compared the estimated allele frequency differences based on case and control pools to frequency dif- ferences estimated from genotyping individuals comprising the pools (Table 1, Fig. 2). The mean absolute error in estimating the allele frequency difference between pools calculated from 48 SNP-pool comparisons was 0.009 + 0.008, and the maximum absolute errors were 0.031, 0.028, and 0.027. The mean absolute error was unchanged (0.009 + 0.008) when the allele frequencies were not adjusted for the heterozygote ratio.
We combined the data from the 16 SNPs to estimate the sources of experimental variability and to compare the experi- mental variability to the sampling variability associated with selecting individuals from the population. The estimated mea-surement variance caused by PCR or primer extension (spcr
2 = 1.18 X 10-4) is smaller than that caused by sample dispensing and mass spectrometr y analysis (sspot
2 = 3.82 X 10-4). For a pool
Fig. 2. Comparison of allele frequency difference estimated from pools to the frequency difference determined from individual genotypes. Each point represents one comparison between F1 and SP, F1 and EC, or SP and EC for 1 of the 16 SNPs. The lines represent the expected result + 0.03.
with n = 500 and allele frequency of 0.50, the summed mea- surement variances of (1.18 + 3.82) X 10-4 = 5.00 X 10-4 are larger than the sampling variability of (0.50)(0.50)/[2 (500)] = 2.5 X 10-4, but replicate PCRs and spots allow us to reduce the measurement variability substantially. For example, when npcr = 4 and nspot = 16 (4 PCRs X 4 spots per PCR), measurement variability is reduced to (1.18/4 + 3.82/16) X 10-4 = 0.53 X 10-4. Sampling variability of allele frequency estimates is an unavoidable consequence of a finite pool size.
Under the conser vative assumption that the 291 additional SNPs would be expected to show no association with diabetes, they provide an opportunity to assess empirically the false positive rate associated with our pool-based test statistic T. Based on the 266 SNPs with at least one typed heterozygous individual, we have 2 X 266 = 532 case-control comparisons and so would expect 532 X 0.05 = 26.6 comparisons significant at the 0.05 level. When basing our test on p (adjusting for the hetero- zygote ratio k), we obser ved 24 (4.5%) comparisons significant at the 0.05 level. When we omitted adjustment for k and used p, we obser ved 26 (4.9%) significant comparisons, 22 of which were also obser ved in the significance test based on p.
We estimated by computer simulation the power to detect case-control allele frequency differences of 0.05, 0.07, and 0.10 by using samples of 200 and 500 cases and controls given either individual genotyping or genotyping of pools (Table 2). Our calculations for pools assume four PCRs per pool and four spots per PCR, and that apcr
2 and aspot2 are equal to their mean values
estimated for the 16 SNPs. Our results suggest only modest decreases in power for pool-based analyses compared with individual-based analyses. For example, the power to detect a 0.07 allele frequency difference between cases and controls at a 0.50 control allele frequency was 82% given genotyping of two pools with 16 replicates each and 87% given genotyping of 500 X 2 = 1,000 individuals.
Discussion Primer extension analysis by mass spectrometr y successfully estimates allele frequency differences between DNA pools with sufficient accuracy and precision to be used as a screening step in large-scale association studies. To test a large number of SNPs on pools, automated assay design, standard assay conditions, and automated data collection are critical. We sought to develop
36
PCR spot
Table 2. Power (%) of pools and individually typed samples to detect 0.05– 0.10 allele frequency differences in cases and controls at significance level 0.05
Case-control difference
Control allele frequency = 0.10 Control allele frequency = 0.50
n Method 0.05 0.07 0.10 0.05 0.07 0.10
200 Pool 48 75 94 28 48 78 Individual 55 81 97 32 52 80
500 Pool 78 95 100 55 81 97 Individual 92 99 100 61 87 98
Power was estimated by computer simulation assuming k = 1.29, a 2 = 1.18 X 10-4
and four PCRs and four spots per PCR for each pool replicate. and a 2 = 3.82 X 10 -4,
standard methods and quality control criteria that would enable us to screen SNPs accurately and quickly.
Compared with an association study based on genotypes of individuals, a pooled DNA association study offers advantages and disadvantages. The primar y advantages are the reduced reagent and labor costs and time required to generate fewer genotypes. In addition, less DNA per sample is used per geno- type when the sample is included in a DNA pool. In our laborator y, DNA pooling offers an =32-fold savings in reagent cost and an =16-fold savings in labor compared with our higher-throughput method for typing individual samples. Be- cause pooling must result in some loss of information, including loss of haplotype information, either a larger sample or a less significant detection threshold is required to achieve power comparable to that for genotyping individuals (Table 2).
Our current high-throughout pooling analysis follows a three- step design. First, we test SNP assays without replication on a crudely quantitated DNA pool to confirm that the assay design succeeds and that the SNP minor allele frequency is >0.05. This practice limits the use of our valuable carefully quantitated pools to successful SNP assays. Although some SNP assays fail under standard conditions, we prefer to develop quality control criteria to discard SNPs rather than spend time adjusting assay condi- tions, because our purpose is high-throughput screening. Sec- ond, we genotype each successful SNP on case and control pools and 7–11 individuals. For each pool, we carr y out 16 replicate genotypes. We discard SNP assays if we detect any evidence of an artifact (see Methods) or if the SD of the 16 replicates is >0.05. Third, we follow up SNPs identified as interesting by this pooling technique by genotyping individual samples to verify allele frequency differences and to allow haplotype analysis and genotype-based phenotypic comparisons.
In comparison to other SNP genotyping methods for screening DNA pools for association, primer extension–mass spectrometr y is reasonably precise. The average allele frequency SD of 0.018 – 0.021 we report is similar to the 0.021 reported for kinetic PCR (11), slightly greater than the 0.014 reported for primer extension-denaturing high-performance liquid chromatography (9), the 0.009 – 0.017 reported for f luorescent nucleotide primer extension-capillar y electrophoresis (14, 18), and the 0.011 re- ported for pyrosequencing (17), and less than the 0.038 reported for bioluminometric-primer extension (13). Further, mass spec- trometr y offers advantages in the potential for automation over several of these other methods.
For the 266 SNPs with at least one genotyped heterozygote, we obser ved significant results from both adjusted (4.5%) and unadjusted (4.9%) pool allele frequencies that were consistent with the expected false positive rates under the null hypothesis of no association, 5%. These results, although limited, suggest that our test is not particularly anticonser vative, despite our decision to ignore variability owing to pool construction. Our simulations suggested that adjusting for k, even based on just a
single heterozygote, was adequate to preser ve the expected false positive rates. In the absence of adjustment for k, our simulations showed that the tests were either conser vative or anticonser va- tive, depending on the underlying allele frequency. This finding, especially in light of the value of individuals in quality control assessment, suggests that typing of a limited number of individ- uals is a useful component for pooling studies.
To assess the potential of mass spectrometr y to screen for allele frequency differences between pools efficiently, we as- sessed the sources of variability in our approach. Experimental variability originates during pool construction, PCR, primer extension, product dispensing onto a chip, and mass spectrom- eter data collection. During pool construction, variability can arise if DNA concentrations are incorrect or pipetting is inac- curate. During PCR, variability may arise from unequal allele amplification given additional SNP(s) under the primer(s), simultaneous amplification of two SNPs, inaccurate pipetting of template DNA or reagents between wells, unequal PCR condi- tions between wells, and sample contamination. During primer extension, variability may be caused by differential incorporation of nucleotides and allele pausing, in which the primer for the two-nucleotide extension incorporates only the deoxynucleotide without addition of the final dideoxynucleotide. During product dispensing and mass spectrometer data collection, variability can arise because of incorporated baseline noise, especially at low peak intensity, decay of detection sensitivity with increasing mass, and inconsistent desorption and ionization.
Based on our study of 16 SNPs, we estimated variances of 1.18 X 10-4 caused by PCR or primer extension and 3.82 X 10-4
caused by product dispensing and data collection. To reduce this measurement variability, we performed four replicate PCR and primer extension reactions and dispensed each product with four replicates for mass spectrometr y analysis. Depending on the desired level of accuracy, more or fewer of either replicate type could be undertaken. The appropriate replicate number depends on numbers of individuals in each pool. Carr ying out many replicates to reduce experimental variability will have little practical value if sampling variability is much greater than experimental variability.
Because we only constructed each DNA pool once, we could not directly estimate the variance caused by pool construction. The fact that ignoring this variability did not appear to result in a strongly anti-conser vative test suggests that, at least for our pools, this variability probably is small. This assumption could be tested directly by the construction of multiple pool replicates, but at the expense of considerable time and effort.
Determining the optimum number of pools for a given case or control sample, whether replicates or smaller pools, should also take into account the theoretical limit on the maximum number of individual DNA templates that can be assayed from any one pool. Given samples of 20 ng of pooled DNA and =13.4 picograms per diploid genome, chromosomes from a maximum
37
of =1,500 individuals (20,000/13.4) can be represented once as template for PCR. If <20 ng of DNA is used, even fewer samples could be measured in pools. Samples of >1,000 case and control individuals would be desirable for complex disease association studies because of the decreased variability caused by sampling from the population. To effectively test a ver y large sample, the individual DNAs could be combined into several pools with fewer individuals or additional PCRs could be performed.
In conclusion, we have determined that primer extension analysis by mass spectrometr y, with appropriate replication, is sufficiently accurate and precise to allow comparison of allele frequency differences between DNA pools. For studies that aim to compare genotypes in hundreds or thousands of case and controls, this approach offers fast, reliable screening of a can- didate region with savings of labor, DNA, and reagent costs compared with genotyping individuals. With the expected de- velopment of a haplotype map of the human genome (4), yielding
a set of 200,000 –300,000 ‘‘gold standard’’ SNPs that allow whole genome association studies to become a reality, the pooling approach may allow large-scale analysis of the genetics of common disease at acceptable genotyping costs.
We gratefully acknowledge the other members of the FUSION collab- oration for making this study possible and Andi Braun and Christy Johnston of Sequenom, Inc., for intellectual contributions and construc- tion of the case and control DNA pools. The FUSION study is made possible by intramural funds from the National Human Genome Re- search Institute (Project No. OH95-C-N030) and by National Institutes of Health Grant HG00376 (to M.B.). This project was supported by a Cooperative Research and Development Agreement between the Na- tional Human Genome Research Institute and Sequenom, Inc. K.L.M. is the recipient of a Burroughs Wellcome Career Award in the Biomed- ical Sciences, K.S. was partially supported by a grant from The Academy of Finland, and T.E.F. was supported by National Institutes of Health Training Grant HG00040.
1. Risch, N. & Merikangas, K. (1996) Science 273, 1516 –1517. 2. Horikawa, Y., Oda, N., Cox, N. J., Li, X., Orho-Melander, M., Hara, M.,
Hinokio, Y., Lindner, T. H., Mashima, H., Schwarz, P. E., et al. (2000) Nat. Genet. 26, 163–175.
3. Hugot, J. P., Chamaillard, M., Zouali, H., Lesage, S., Cezard, J. P., Belaiche, J., Almer, S., Tysk, C., O’Morain, C. A., Gassull, M., et al. (2001) Nature 411, 599 – 603.
4. Judson, R., Salisbur y, B., Schneider, J., Windemuth, A. & Stephens, J. C. (2002) Pharmacogenomics 3, 379 –391.
5. Arnheim, N., Strange, C. & Erlich, H. (1985) Proc. Natl. Acad. Sci. USA 82, 6970 – 6974.
6. Breen, G., Harold, D., Ralston, S., Shaw, D. & St. Clair, D. (2000) BioTech- niques 28, 464 – 470.
7. Hoogendoorn, B., Norton, N., Kirov, G., Williams, N., Hamshere, M. L., Spurlock, G., Austin, J., Stephens, M. K., Buckland, P. R., Owen, M. J. & O’Donovan, M. C. (2000) Hum. Genet. 107, 488 – 493.
8. Wolford, J. K., Blunt, D., Ballecer, C. & Prochazka, M. (2000) Hum. Genet. 107, 483– 487.
9. Giordano, M., Mellai, M., Hoogendoorn, B. & Momigliano-Richiardi, P. (2001) J. Biochem. Biophys. Methods 47, 101–110.
10. Kosaki, K., Yoshihashi, H., Ohashi, Y., Kosaki, R., Suzuki, T. & Matsuo, N. (2001) J. Biochem. Biophys. Methods 47, 111–119.
11. Germer, S., Holland, M. J. & Higuchi, R. (2000) Genome Res. 10, 258 –266. 12. Sasaki, T., Tahira, T., Suzuki, A., Higasa, K., Kukita, Y., Baba, S. & Hayashi,
K. (2001) Am. J. Hum. Genet. 68, 214 –218. 13. Zhou, G., Kamahori, M., Okano, K., Chuan, G., Harada, K. & Kambara, H.
(2001) Nucleic Acids Res. 29, E93.
14. Matyas, G., Giunta, C., Steinmann, B., Hossle, J. P. & Hellwig, R. (2002) Hum. Mutat. 19, 58 – 68.
15. Gruber, J. D., Colligan, P. B. & Wolford, J. K. (2002) Hum. Genet. 110, 395– 401.
16. Neve, B., Froguel, P., Corset, L., Vaillant, E., Vatin, V. & Boutin, P. (2002) BioTechniques 32, 1138 –1142.
17. Wasson, J., Skolnick, G., Love-Gregor y, L. & Permutt, M. A. (2002) BioTech- niques 32, 1144 –1152.
18. Norton, N., Williams, N. M., Williams, H. J., Spurlock, G., Kirov, G., Morris, D. W., Hoogendoorn, B., Owen, M. J. & O’Donovan, M. C. (2002) Hum. Genet. 110, 471– 478.
19. Ross, P., Hall, L. & Haff, L. A. (2000) BioTechniques 29, 620 – 629. 20. Buetow, K. H., Edmonson, M., MacDonald, R., Clifford, R., Yip, P., Kelley, J.,
Little, D. P., Strausberg, R., Koester, H., Cantor, C. R. & Braun, A. (2001) Proc. Natl. Acad. Sci. USA 98, 581–584.
21. Werner, M., Sych, M., Herbon, N., Illig, T., Konig, I. R. & Wjst, M. (2002) Hum. Mutat. 20, 57– 64.
22. Valle, T., Tuomilehto, J., Bergman, R. N., Ghosh, S., Hauser, E. R., Eriksson, J., Nylund, S. J., Kohtamaki, K., Toivanen, L., Vidgren, G., et al. (1998) Diabetes Care 21, 949 –958.
23. Douglas, J. A., Erdos, M. R., Watanabe, R. M., Braun, A., Johnston, C. L., Oeth, P., Mohlke, K. L., Valle, T. T., Ehnholm, C., Buchanan, T. A., et al. (2001) Diabetes 50, 886 – 890.
24. Little, D. P., Braun, A., Darnhofer-Demar, B., Frilling, A., Li, Y., McIver, R. T., Jr., & Koster, H. (1997) J. Mol. Med. 75, 745–750.
25. Sun, X., Ding, H., Hung, K. & Guo, B. (2000) Nucleic Acids Res. 28, E68. 26. Bray, M. S., Boerwinkle, E. & Doris, P. A. (2001) Hum. Mutat. 17, 296 –304.
38
High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools Supporting information for Mohlke et al. (2002) Proc. Natl. Acad. Sci. USA, 10.1073/pnas.262661399
Table 3. Primer sequences for single nucleotide polymorphisms genotyped on pools and individuals
Primer Sequence
ss121556_FOR AGCGGATAACGACGCCATCAGGCTCTTTAG
ss121556_REV AGCGGATAACAATTTCACACAGGAGATGGGACTCCCTGATCCT
ss121556_EXT GGCTCTTTAGGGAGAAGTCT
ss121557_FOR AGCGGATAACACATGGCATGCTGGAAAAGG
ss121557_REV AGCGGATAACAATTTCACACAGGTAAAAATCCTCCGGGCTCTG
ss121557_EXT TGGAAAAGGAAAAACTAGAGAGGC
ss146317_FOR AGCGGATAACTACACTGGCAGTCACTTCTG
ss146317_REV AGCGGATAACAATTTCACACAGGTCTTGCTCTAAGGAGGGATG
ss146317_EXT CTTCTCCGATCACCTTCAATAA
ss148393_FOR AGCGGATAACAAGATGTGATCTAGGGCCTC
ss148393_REV AGCGGATAACAATTTCACACAGGCCATTCCCTAAACACACTTG
ss148393_EXT GGGAAGTCAAGCAAACCAAGTACA
ss64248_FOR AGCGGATAACGTGCATAAGAATCACCAGGG
ss64248_REV AGCGGATAACAATTTCACACAGGGCCTGTTAGAAGTGAGGATC
ss64248_EXT CACCAGGGGAATTTTTTCACA
ss86876_FOR AGCGGATAACGAAACGAAATGGCACACAGG
ss86876_REV AGCGGATAACAATTTCACACAGGCACTTTGAGAAGGGTGAGTG
ss86876_EXT CACACAGGGCACCGATCC
ss93115_FOR AGCGGATAACATGTGCAGACACCAGAGAGC
ss93115_REV AGCGGATAACAATTTCACACAGGATTGTCTTGTCCCTTCCCGC
ss93115_EXT CATGGATGTGGAGGGACAC
GLUT10_1_FOR AGCGGATAACCCTCATCCCACTCCAGGG
GLUT10_1_REV AGCGGATAACAATTTCACACAGGAGGAGTACCGTGGCCTCC
GLUT10_1_EXT CCACTCCAGGGAGGTGAG
GLUT10_14_FOR AGCGGATAACGCTGATATTTCTCAGGATCC
GLUT10_14_REV AGCGGATAACAATTTCACACAGGTGGGCCGAAGAACAAAACAG
GLUT10_14_EXT GAATGTAAACTCTTCCCCT
PPARg2_FOR gctgttatgggtgaaactctg
PPARg2_REV agcggataacaatttcacacaggcagtgtatcagtgaaggaatcg
PPARg2_EXT tctgggagattctcctattgac
SNP43_FOR CTGTGTGTGGGCAGAGGAC
39
SNP43_REV AGCGGATAACAATTTCACACAGGCCTCATCCTCACCAAGTCAAG
SNP43_EXT CGCTTGCTGCGAAGTAAGGC
SNP56_FOR CAAGGGTGGTGTCCTCAGTT
SNP56_REV agcggataacaatttcacacaggCCTCGCACTAGTGAAAGGA
SNP56_EXT CAGTTTGTGACCTTCCCCT
SNP63_FOR agcggataacCCTGAAGGTTCCACTCTCCA
SNP63_REV agcggataacaatttcacacaggCTCCCTGGTCACTGGATGTT
SNP63_EXT GACGCGGCCCACCCCCTC
ss1304220_FOR AGCGGATAACATGAGGGTGGGAGGTGCAAC
ss1304220_REV AGCGGATAACAATTTCACACAGGTGAAGCAGGAAGCCTTGCAG
ss1304220_EXT GTGCAACCCCCTTGATGAGGC
ss146316_FOR AGCGGATAACCACGCTAGAATCATGTGTCC
ss146316_REV AGCGGATAACAATTTCACACAGGTCCTCTCTACTGTCTCCTTC
ss146316_EXT TCATGTGTCCAAGGGCTCAC
ss86782_FOR AGCGGATAACAGCCACTTGAACTTCTCGAG
ss86782_REV AGCGGATAACAATTTCACACAGGTAAGCTTCCTGCCTTGCTAG
ss86782_EXT TTTCTTGAGCTTAGCTTCAGG
FOR, forward PCR primer; REV, reverse PCR primer; EXT, extendable primer. Gene-specific portions of PCR primers are underlined.
40
Chapter 3
A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants
Science2007;316(5829):1341-5
41
42
clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others
here.following the guidelines
can be obtained byPermission to republish or repurpose articles or portions of articles
): May 28, 2011 www.sciencemag.org (this infomation is current as of
The following resources related to this article are available online at
http://www.sciencemag.org/content/316/5829/1341.full.htmlversion of this article at:
including high-resolution figures, can be found in the onlineUpdated information and services,
http://www.sciencemag.org/content/suppl/2007/04/25/1142382.DC1.html can be found at: Supporting Online Material
http://www.sciencemag.org/content/316/5829/1341.full.html#ref-list-1, 3 of which can be accessed free:cites 3 articlesThis article
680 article(s) on the ISI Web of Sciencecited by This article has been
http://www.sciencemag.org/content/316/5829/1341.full.html#related-urls100 articles hosted by HighWire Press; see:cited by This article has been
http://www.sciencemag.org/cgi/collection/geneticsGenetics
subject collections:This article appears in the following
43
G. Brice,6 B. Bullman,7 J. Campbell,8 B. Castle,9 R. Cetnarsyj,8 C.Chapman,10 C. Chu,11 N. Coates,12 T. Cole,10 R. Davidson,4
A. Donaldson,13 H. Dorkins,3 F. Douglas,2 D. Eccles,9 R. Eeles,1
F. Elmslie,6 D. G. Evans,7 S. Goff,6 S. Goodman,5 D. Goudie,2
J. Gray,15 L. Greenhalgh,16 H. Gregory,17 S. V. Hodgson,6
T. Homfray,6 R. S. Houlston,1 L. Izatt,18 L. Jackson,18
L. Jeffers,19 V. Johnson-Roffey,12 F. Kavalier,18 C. Kirk,19
F. Lalloo,7 C. Langman,18 I. Locke,1 M. Longmuir,4 J. Mackay,20
A. Magee,19 S. Mansour,6 Z. Miedzybrodzka,17 J. Miller,11
P. Morrison,19 V. Murday,4 J. Paterson,21 G. Pichert,18
M. Porteous,8 N. Rahman,6 M. Rogers,15 S. Rowe,22 S. Shanley,1
A. Saggar,6 G. Scott,2 L. Side,23 L. Snadden,4 M. Steel,2 M. Thomas,5
S. Thomas,11Clinical Genetics Service, Royal Marsden Hospital, DownsRoad, Sutton, Surrey, SM2 5PT, UK. 2Department ofClinical Genetics, Ninewells Hospital, Dundee, DD1 9SY,UK. 3Medical and Community Genetics, Kennedy-GaltonCentre, Level 8V, Northwick Park and St. Mark’s NHS Trust,Watford Rd, Harrow, HA1 3UJ, UK. 4Institute of MedicalGenetics, Yorkhill NHS Trust, Dalnair Street, Glasgow, G38SJ, UK. 5Clinical Genetics Department, Royal Devon andExeter Hospital (Heavitree), Gladstone Road, Exeter, EX12ED, UK. 6Department of Clinical Genetics, St. George’s
Hospital Medical School, Jenner Wing, Cranmer Terrace,London, SW17 0RE, UK. 7Department of Medical Genetics,St. Mary’s Hospital, Hathersage Road, Manchester, M130JH, UK. 8South East of Scotland Clinical Genetics Service,Western General Hospital, Crewe Road, Edinburgh, EH42XU, UK. 9Department of Medical Genetics, The PrincessAnne Hospital, Coxford Road, Southampton, S016 5YA, UK.10Clinical Genetics Unit, Birmingham Women’s Hospital,Metchley Park Road, Edgbaston, Birmingham, B15 2TG,UK. 11Yorkshire Regional Genetic Service, Department ofClinical Genetics, Cancer Genetics Building, St. JamesUniversity Hospital, Beckett Street, Leeds, LS9 7TF, UK.12Department of Clinical Genetics, Leicester Royal Infirm-ary, Leicester, LE1 5WW, UK. 13Department of ClinicalGenetics, St Michael’s Hospital, Southwell Street, Bristol,BS2 8EG, UK. 14Institute of Human Genetics, InternationalCentre for Life, Central Parkway, Newcastle upon Tyne, NE13BZ, UK. 15Institute of Medical Genetics, UniversityHospital of Wales, Heath Park, Cardiff, CF14 4XW, UK.16Department of Clinical Genetics, Alder Hey Children’sHospital, Eaton Road, Liverpool L12 2AP, UK. 17ClinicalGenetics Centre, Argyll House, Foresterhill, Aberdeen,AB25 2ZR, UK. 18Clinical Genetics, 7th Floor New Guy’s
House, Guy’s Hospital, St. Thomas Street, London, SE1 9RT,UK. 19Clinical Genetics Service, Belfast City Hospital Trust,Belvoir Park Hospital, Lisburn Road, Belfast, BT9 7AB, UK.20Clinical and Medical Genetics Unit, Institute of ChildHealth, 30 Guildford Street, London, WC1N 1EH, UK.21Department of Clinical Genetics, Addenbrooke’s NHSTrust, Box 134, Hills Road, Cambridge, CB2 2QQ, UK.22Department of Clinical Genetics, Moston Lodge, Countessof Chester Hospital, Liverpool Road, Chester, CH2 1UL, UK.23Department of Clinical Genetics, Churchill Hospital, OldRoad, Headington, Oxford OX3 7LJ, UK.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/1142364/DC1Materials and MethodsFigs. S1 to S8Tables S1 to S10References
9 March 2007; accepted 20 April 2007Published online 26 April 2007;10.1126/science.1142364Include this information when citing this paper.
A Genome-Wide Association Study ofType 2 Diabetes in Finns DetectsMultiple Susceptibility VariantsLaura J. Scott,1 Karen L. Mohlke,2 Lori L. Bonnycastle,3 Cristen J. Willer,1 Yun Li,1William L. Duren,1 Michael R. Erdos,3 Heather M. Stringham,1 Peter S. Chines,3Anne U. Jackson,1 Ludmila Prokunina-Olsson,3 Chia-Jen Ding,1 Amy J. Swift,3 Narisu Narisu,3Tianle Hu,1 Randall Pruim,4 Rui Xiao,1 Xiao-Yi Li,1 Karen N. Conneely,1 Nancy L. Riebow,3Andrew G. Sprau,3 Maurine Tong,3 Peggy P. White,1 Kurt N. Hetrick,5 Michael W. Barnhart,5Craig W. Bark,5 Janet L. Goldstein,5 Lee Watkins,5 Fang Xiang,1 Jouko Saramies,6Thomas A. Buchanan,7 Richard M. Watanabe,8,9 Timo T. Valle,10 Leena Kinnunen,10,11Gonçalo R. Abecasis,1 Elizabeth W. Pugh,5 Kimberly F. Doheny,5 Richard N. Bergman,9Jaakko Tuomilehto,10,11,12 Francis S. Collins,3* Michael Boehnke1*
Identifying the genetic variants that increase the risk of type 2 diabetes (T2D) in humans hasbeen a formidable challenge. Adopting a genome-wide association strategy, we genotyped 1161Finnish T2D cases and 1174 Finnish normal glucose-tolerant (NGT) controls with >315,000single-nucleotide polymorphisms (SNPs) and imputed genotypes for an additional >2 millionautosomal SNPs. We carried out association analysis with these SNPs to identify genetic variantsthat predispose to T2D, compared our T2D association results with the results of two similar studies,and genotyped 80 SNPs in an additional 1215 Finnish T2D cases and 1258 Finnish NGT controls.We identify T2D-associated variants in an intergenic region of chromosome 11p12, contributeto the identification of T2D-associated variants near the genes IGF2BP2 and CDKAL1 and theregion of CDKN2A and CDKN2B, and confirm that variants near TCF7L2, SLC30A8, HHEX, FTO,PPARG, and KCNJ11 are associated with T2D risk. This brings the number of T2D loci now confidentlyidentified to at least 10.
Type 2 diabetes (T2D) is a disease charac-terized by insulin resistance and impairedpancreatic beta-cell function that affects
>170 million people worldwide (1). With first-degree relatives having ~3.5 times as much riskas compared to individuals in the general middle-aged population (2), hereditary factors, togetherwith lifestyle and behavioral factors, play animportant role in determining T2D risk (3). Todate, intense efforts to identify genetic risk factorsin T2D have met with only limited success. Thisstudy, reports from our collaborators (4–6), andthe recently published work of Sladek et al. (7)describe results of genome-wide association
(GWA) studies that further define the geneticarchitecture of T2D and identify biological path-ways involved in T2D pathogenesis.
We genotyped 1161 Finnish T2D cases and1174 Finnish NGTcontrols on 317,503 SNPs onthe Illumina HumanHap300 BeadChip in stage1 of a two-stage GWA study of T2D (8). Thesesamples are from the Finland–United States In-vestigation of Non–Insulin-Dependent DiabetesMellitus Genetics (FUSION) (9, 10) and Finrisk2002 (11) studies (tables S1 and S2A). Among the317,503 GWA SNPs, 315,635 had ≥10 copies ofthe less common allele [minor allele frequency(MAF) > 0.002] and passed quality-control crite-
ria (8). We tested these 315,635 SNPs for asso-ciationwith T2D using amodel that is additive onthe log-odds scale (Table 1 and tables S3 and S4)(8). We observed a modest excess (41 observedversus 31.6 expected; P = 0.19) of SNPs withP values < 10−4 (fig. S1). These results argueagainst the existence of multiple common SNPswith a large impact on T2D disease risk but areconsistent with the presence of multiple commonSNPs that each confer modest risk. The resultsalso suggest that the matching of cases and con-trols by birth province, sex, and age (8) has beensuccessful; in support of this conclusion, thegenomic control (12) correction value is 1.026.
Analysis of our Illumina HumanHap300 dataallowed us to query much of the known SNPvariation in the genome. To increase this pro-portion, we developed an imputation method(8, 13) that uses genotype data and linkage dis-equilibrium (LD) information from the HapMapCentre d’Etude du Polymorphisme Humain(Utah residents with ancestry from northern and
1Department of Biostatistics and Center for StatisticalGenetics, University of Michigan, Ann Arbor, MI 48109,USA. 2Department of Genetics, University of NorthCarolina, Chapel Hill, NC 27599, USA. 3Genome Technol-ogy Branch, National Human Genome Research Institute,Bethesda, MD 20892, USA. 4Department of Mathematicsand Statistics, Calvin College, Grand Rapids, MI 49546,USA. 5Center for Inherited Disease Research (CIDR),Institute of Genetic Medicine, Johns Hopkins School ofMedicine, Baltimore, MD 21224, USA. 6Savitaipale HealthCenter, 54800 Savitaipale, Finland. 7Division of Endocri-nology, Keck School of Medicine, University of SouthernCalifornia, Los Angeles, CA 90033, USA. 8Department ofPreventive Medicine, Keck School of Medicine, Universityof Southern California, Los Angeles, CA 90089, USA.9Department of Physiology and Biophysics, Keck School ofMedicine, University of Southern California, Los Angeles,CA 90033, USA. 10Diabetes Unit, Department of Epide-miology and Health Promotion, National Public HealthInstitute, 00300 Helsinki, Finland. 11Department of PublicHealth, University of Helsinki, 00014 Helsinki, Finland.12South Ostrobothnia Central Hospital, 60220 Seinäjoki,Finland.
*To whom correspondence should be addressed. E-mail:[email protected] (M.B.); [email protected] (F.S.C.)
44
Table1.
Confirm
edT2Dsusceptib
ility
locibasedon
allavailabledata
from
theFU
SION,DGI,andWTCCC
/UKT2D
samples.
Position
Risk
allele
/no
nrisk
FUSION
Stage1+2
control
risk
allele
FUSION
stage1
FUSION
stage2
FUSION
stage1+2
DGIA
llDa
taWTCCC
/UKT2D
AllD
ata
FUSION
-DGI-
WTCCC
/UKT2D
AllD
ata
Total
sample
sizefor
80%
powe
r**
FUSION
Chr
(bp)
Genes
allele
freq.
OR(95%
CI)
POR
(95%
CI)
POR
(95%
CI)
POR
(95%
CI)
POR
(95%
CI)
POR
(95%
CI)
PNe
wT2DLoci
rs44
0296
03
186,99
4,38
9IGF2BP
2T/G
0.30
1.28
(1.13–
1.45
)1.2×
10–4
1.08
(0.96–
1.22
)0.22
1.18
(1.08–
1.28
)2.1×10
–4
1.17
(1.11–
1.23
)1.7×
10–9
1.11
(1.05–
1.16
)1.6×
10–4
1.14
(1.11–
1.18
)8.9×
10–16
~43
00
rs77
5484
0*6
20,769
,229
CDKA
L1C/G
0.36
1.16
(1.02–
1.30
)0.02
11.08
(0.96–
1.22
)0.20
1.12
(1.03–
1.22
)0.00
951.08
(1.03–
1.14
)2.4×
10–3
1.16
(1.10–
1.22
)1.3×
10–8
1.12
(1.08–
1.16
)4.1×
10–11
~53
00
rs10
8116
619
22,124
,094
CDKN
2A/B
T/C
0.85
1.17
(0.98–
1.39
)0.08
21.22
(1.04–
1.44
)0.01
51.20
(1.07–
1.36
)0.00
221.20
(1.12–
1.28
)5.4×
10–8
1.19
(1.11–
1.28
)4.9×
10–7
1.20
(1.14–
1.25
)7.8×
10–15
~39
00
rs93
0003
9†11
41,871
,942
C/A
0.89
1.52
(1.24–
1.87
)6.0×
10–5
1.45
(1.19–
1.77
)2.7×
10–4
1.48
(1.28–
1.71
)5.7×
10–8
1.16
¶
(0.95–
1.42
)0.12
1.13
#
(0.99–
1.29
)0.06
81.25
(1.15–
1.37
)4.3×
10–7
~34
00
rs80
5013
616
52,373
,776
FTO
A/C
0.38
1.03
(0.92–
1.16
)0.58
1.18
(1.05–
1.33
)0.00
631.11
(1.02–
1.20
)0.01
61.03
¶
(0.91–
1.17
)0.25
1.23
(1.18–
1.32
)7.3×
10–14
1.17
(1.12–
1.22
)1.3×
10–12
~27
00
Previously
publish
edT2Dassociation
rs18
0128
23
12,368
,125
PPAR
GC/G
0.82
1.30
(1.11–
1.53
)0.00
111.08
(0.93–
1.26
)0.33
1.20
(1.07–
1.33
)0.00
141.09
(1.01–
1.16
)0.01
91.23
#
(1.09–
1.41
)0.00
131.14
(1.08–
1.20
)1.7×
10–6
~64
00
rs13
2666
348
118,25
3,96
4SLC3
0A8
C/T
0.61
1.22
(1.08–
1.38
)0.00
101.14
(1.02–
1.28
)0.02
61.18
(1.09–
1.29
)7.0×
10–5
1.07
(1.0–1
.16)
0.04
71.12
(1.05–
1.18
)7.0×
10–5
1.12
(1.07–
1.16
)5.3×
10–8
~51
00
rs11
1187
5‡10
94,452
,862
HHEX
C/T
0.52
1.13
(1.01–
1.27
)0.03
91.06
(0.94–
1.19
)0.34
1.10
(1.01–
1.19
)0.02
61.14
(1.06–
1.22
)1.7×
10–4
1.13
(1.07–
1.19
)4.6×
10–6
1.13
(1.09–
1.17
)5.7×
10–10~42
00
rs79
0314
6§10
114,74
8,33
9TCF7L2
T/C
0.18
1.39
(1.20–
1.61
)1.2×
10–5
1.30
(1.12–
1.50
)3.5×
10–4
1.34
(1.21–
1.49
)1.3×
10–8
1.38
(1.31–
1.46
)2.3×
10–31
1.37
#
(1.25–
1.49
)6.7×
10–13
1.37
(1.31–
1.43
)1.0×
10–48
~10
00
rs52
19||
1117
,366
,148
KCNJ11
T/C
0.46
1.20
(1.07–
1.36
)0.00
221.04
(0.92–
1.16
)0.55
1.11
(1.02–
1.21
)0.01
31.15
(1.09–
1.21
)1.0×
10–7
1.15
#
(1.05–
1.25
)0.00
131.14
(1.10–
1.19
)6.7×
10–11
~37
00
Totalsamplesize
2,33
52,47
34,80
813
,781
13,965
32,544
Num
berof
cases/controls
1,16
1/1,17
41,21
5/1,25
82,37
6/2,43
26,52
9/7,25
25,68
1/8,28
414
,586
/17,96
8
*rs109
4639
8WTCCC
/UKT2D
(r2=
1).
†Multim
arkertagforrs93
0003
9DGIandrs15
1482
3WTCCC
/UKT2D
(r2=
0.96
5).
‡rs501
5480
WTCCC
GWAonly
(r2=
1).
§rs790
1695
WTCCC
/UKT2D
(r2=
0.84
9).
||rs521
5WTCCC
/UKT2D
(r2=
0.99
5).
¶DGIG
WAsamples.
#WTCCC
GWAsamples.
**Ap
proximatetotalsam
plesize
for80
%power
todetectT2DSN
Passociationat
significancelevel0
.05isbasedon
theFU
SIONcontrolriskallelefrequencyandtheriskratio
calculated
from
FUSION-DGI-W
TCCC
/UKT2D
all-d
ataanalyses,assuming0.10
T2Dprevalence.Thesamplesizesvary
slightly
from
thoseof
(4)becausestudy-specificallele
frequencieswereused
inthecalculations.
45
western Europe) (CEU) samples to predictgenotypes of autosomal SNPs not genotyped inour subjects. A total of 2.09 million HapMapCEU SNPs (14) had imputed MAF >1% inFUSION and passed our imputation quality-control criteria. In the HapMap CEU sample,imputed SNPs passing these criteria increasedcoverage of SNPs with MAF >1% from 71.9 to89.1% at an r2 threshold of 0.8.
To increase the statistical power to detect T2Dpredisposing variants, we compared our stage1 results to GWA results from the DiabetesGenetics Initiative (DGI) and theWellcome TrustCase Control Consortium (WTCCC). Weselected 82 SNPs for FUSION stage 2 follow-up genotyping based on evidence from: (i)FUSION-genotyped and FUSION-imputedSNPs; (ii) a combined analysis of GWA resultsfrom FUSION, DGI, and WTCCC; and (iii)previous T2D association results. For (i) and (ii),we used a prioritization algorithm that advan-taged SNPs based on genome annotation (8)(table S7) and gave preference to genotypedSNPs over nearby imputed SNPs. We success-fully genotyped 80 of the 82 SNPs in our stage 2sample of 1215 Finnish T2D cases and 1258
Finnish NGTcontrols (8) (table S2B) and carriedout joint analysis of the combined FUSIONstage 1 + 2 sample (table S5). DGI (4) andUnited Kingdom T2D Genetics Consortium(UKT2D) (5) investigators also followed upDGI and WTCCC GWAs by genotyping rep-lication samples.
We confirmed well-established T2D asso-ciations with TCF7L2, PPARG, and KCNJ11(Table 1) (15–18). SNPs in TCF7L2 reachedgenome-wide significance in the FUSION stage1 + 2 sample [odds ratio (OR) = 1.34, P = 1.3 ×10−8] and in the FUSION-DGI-WTCCC/UKT2D “all-data” (i.e., all GWA and follow-upsamples) meta-analysis (OR = 1.37, P = 1.0 ×10−48) (Table 1 and table S5).PPARGPro12→Ala12
(rs1801282) and KCNJ11 Glu23→Lys23 (rs5219)were not genotyped in the FUSION GWA, butnearby SNPs showed some evidence for T2Dassociation, as did the imputed genotypes for thecoding variants. All-data meta-analysis resultedin genome-wide significant T2D associationwith KCNJ11 Glu23→Lys23 (OR = 1.14, P =6.7 × 10−11) and strong evidence for PPARGPro12→Ala12 (OR = 1.14, P = 1.7 × 10−6). ThePPARG and KCNJ11 results emphasize the value
of combining data across studies and suggest thatother T2D-associated loci remain to be found.
The combined samples from the three studiesprovide evidence for seven additional T2D loci.For the first three of these loci, we had strongevidence in the FUSION stage 1 GWA data and,for the latter four, our FUSION stage 1 evidencewas more modest.
A cluster of variants in the IGF2BP2 (insulin-like growth factor 2 mRNA binding protein 2)region was associated with T2D in our stage1 sample (e.g., rs1470579 with OR = 1.27, P =1.6 × 10−4) (Fig. 1A). The all-data meta-analysisfor rs4402960 resulted in genome-wide signifi-cance (OR = 1.14, P= 8.9 × 10−16). Including thers4402960 genotype as a covariate essentiallyeliminates evidence for T2D association for othervariants in the cluster (Fig. 1A), which isconsistent with all SNPs representing the sameT2D-predisposing variant(s). IGF2BP2 is aparalog of IGF2BP1, which binds to the 5′untranslated region of the insulin-like growthfactor 2 (IGF2) mRNA and regulates IGF2translation (19). IGF2 is a member of the insulinfamily of polypeptide growth factors involved inthe development, growth, and stimulation of
Fig. 1. Plots of T2D association and LD in FUSION stage 1 samples forregions surrounding IGF2BP2 (A) and rs9300039 (B). (A) and (B) each containsix panels. The top panels display RefSeq genes; there are none in thers9300039 region. The second panels (i.e., directly below the top panels) showthe T2D association –log10 P values in FUSION stage 1 samples for SNPsgenotyped in the GWA panel (closed blue circles) or imputed (open bluecircles). The third panels show T2D association –log10 P values for each SNP ina logistic regression model correcting for the reference SNP [indicated by thered circle for rs4402960 in (A) and for rs9300039 in (B)]. SNP rs7480010,
reported by Sladek et al. (7), is also labeled in the rs9300039 plot (B) (greencircle). A decrease in the –log10 P value from the second to the third panelsindicates that the association signal of the tested SNPs can be explained, atleast in part, by the reference SNP. In both regions, the reference SNP waschosen for convenience; the choice of another strongly associated SNP nearbywould have resulted in a similar picture. The fourth panels show recombinationrate in centimorgans per megabase for the HapMap CEU sample (14). The fifthand sixth panels show LD r2 and D' based on FUSION stage 1–genotyped andFUSION stage 1–imputed data.
46
insulin action. The most strongly associatedIGF2BP2 SNPs are located in a 50-kb regionwithin intron 2 (Fig. 1A); diabetes-predisposingvariants may therefore affect regulation ofIGF2BP2 expression.
SNP rs13266634, a nonsynonymousArg325→Trp325 variant in the pancreatic beta-cell–specific zinc transporter SLC30A8 (20),showed (through our annotation-based algorithm)evidence for T2D association in stage 1 (Table 1and fig. S2). Modest evidence in stage 2 resultedin stronger evidence in our stage 1 + 2 sample(OR = 1.18, P = 7.0 × 10−5) (Table 1 and tableS5). Subsequent DGI and UKT2D genotypingresulted in strong evidence in the combined sam-ples (OR = 1.12, P= 5.3 × 10−8). Sladek et al. (7)recently reported independent T2D associationevidence with the same allele in two Frenchsamples (P = 1.8 × 10−5 and P = 5.0 × 10−7).SLC30A8 transports zinc from the cytoplasminto insulin secretory vesicles (20, 21), whereinsulin is stored as a hexamer bound with twoZn2+ ions before secretion (22). Variation inSLC30A8may affect zinc accumulation in insulingranules, affecting insulin stability, storage, orsecretion. In high-glucose conditions, overex-pression of SLC30A8 in insulinoma (INS-1E)
cells enhanced glucose-induced insulin secretion(21).
SNP rs9300039 in an intergenic region onchromosome 11 showed evidence for T2D asso-ciation in stage 1 (Table 1 and Fig. 1B); geno-typing our stage 2 sample resulted in neargenome-wide significance in our stage 1 + 2sample (OR = 1.48, P = 5.7 × 10−8) (Table 1 andtables S3 and S5). In the WTCCC and DGIscans, the nearby SNP rs1514823 (r2 = 0.97 withrs9300039) provided weak evidence for T2Dassociation with the appropriate allele; com-bining results across all three studies gave OR =1.25 and P = 4.3 × 10−7. Fifty-six imputed SNPsand two more genotyped SNPs spanning 219 kbare in LD with rs9300039 and show substantialevidence for T2D association (P < 10−4) in ourstage 1 sample (table S3 and Fig. 1B). Includingthe genotype for rs9300039 as a covariate es-sentially eliminates evidence for T2D associationwith the remaining SNPs (Fig. 1B). This regionincludes three sets of spliced ExpressedSequence Tags but no annotated genes. Theidentification of a T2D-associated variant >1 Mbfrom the nearest annotated gene highlights thevalue of a genome-wide approach. Sladek et al.(7) reported strongly associated SNPs in twonearby regions on chromosome 11. SNPrs7480010 near hypothetical gene LOC387761is 331 kb centromeric to rs9300039. LD betweenrs9300039 and rs7480010 is essentially zero(r2 = 0.00063 and D' = 0.036), and rs7480010showed little evidence for association in our stage1 + 2 sample (OR = 1.03, P = 0.54). Sladek et al.(7) also reported T2D association with threeintronic variants of EXT2, located ~2.4 Mbcentromeric of rs9300039; we found no evidencefor association with EXT2 SNPs.
SNP rs4712523, located within intron 5 ofCDKAL1, showed modest evidence for T2D as-sociation in our FUSION stage 1 sample, whichstrengthened slightly in our combined stage 1 + 2sample (OR = 1.12, P = 0.0073) (table S5).Nearby SNPs in strong LD with rs4712523including rs7754840 showed modest evidencefor T2D association in the DGI scan andconsiderably stronger evidence in the WTCCCscan. Including strong DGI and UKT2D repli-cation data resulted in genome-wide significance(OR = 1.12, P = 4.1 × 10−11 for rs7754840) in theall-datameta-analysis (Table 1). CDKAL1 [cyclin-dependent kinase 5 (CDK5) regulatory subunitassociated protein–1–like 1] shares protein do-main similarity with CDK5 regulatory subunit–associated protein 1 (CDK5RAP1), which spe-cifically inhibits activation of CDK5 by CDK5regulatory subunit 1 (CDK5R1) (23). Usingquantitative reverse transcription polymerasechain reaction analysis of a panel of RNAsamples from human tissues and cells, wedetected the highest expression of CDKAL1 inskeletal muscle and brain cells, as well as in 293TandHepG2 cells (fig. S3A). The associated SNPswithin intron 5, or SNPs in LD with them, mayregulate expression of CDKAL1 and so affect the
expression of CDK5. CDK5 and CDK5R1 ac-tivity is influenced by glucose and may influencebeta-cell processes (24, 25); overactivity ofCDK5 in the pancreas may lead to beta-cell de-generation, especially under glucotoxic condi-tions (26).
SNP rs10811661 near cyclin-dependent ki-nase inhibitors CDKN2A and CDKN2B showedmodest evidence for T2D association in ourstage 1 + 2 sample (OR = 1.20, P = 0.0022)(Table 1 and table S5) and showed genome-widesignificance in the all-data meta-analysis (OR =1.20, P = 7.8 × 10−15). SNP rs10811661 islocated upstream of CDKN2A and CDKN2B,may have a long-range effect on one of thesegenes, or may influence a gene not yet an-notated. CDKN2A and CDKN2B inhibit theactivity of CDK4 and CDK6. In mice, Cdk4activity has been shown to influence beta-cellproliferation and mass, with loss of Cdk4leading to diabetes (27, 28). We find CDKN2Ato be expressed at high levels in islets,adipocytes, brain, and pancreas and at evenhigher levels in 293T, HeLa, and HepG2 cells(fig. S3B); CDKN2B is expressed in islets andadipocytes and, to a lesser degree, in small intes-tine, colon, 293T, and HepG2 cells (fig. S3C).CDKN2A and CDKN2B are also tumor suppres-sor genes and may play a role in aging (29).
SNPs rs1111875 and rs7923837 showedmod-est evidence of T2D association in the FUSIONand DGI scans, much stronger evidence in theWTCCC scan, and genome-wide significant evi-dence (OR = 1.13,P= 5.7 × 10−10 for rs1111875)in the all-data meta-analysis. These SNPs are inLD (r2 = 0.70) in a region that includes HHEX(hematopoietically expressed homeobox),which is critical for development of the ventralpancreas (30), the insulin-degrading enzymegene IDE, and the kinesin-interacting factor 11gene KIF11. Sladek et al. (7) recently reportedindependent genome-wide significant evidencefor T2D association with these SNPs.
The WTCCC/UKT2D groups identified evi-dence for T2D and body mass index (BMI)associations with a set of SNPs includingrs8050136 in the FTO region; the T2D associa-tion appears to be mediated through a primaryeffect on adiposity (5, 6, 31). We observedmodest evidence for association with T2D inthe combined FUSION stage 1 + 2 sample (OR =1.11, P = 0.016) (Table 1 and table S5).
T2D can be a component of a larger syn-drome of metabolic abnormalities, and we wereinterested to assess the effects of T2D-relatedtraits on our association results. We repeated ourT2D association analysis for the 10 SNPs inTable 1 with one of several variables included asan additional covariate. Adjustment for BMIstrengthened T2D association with TCF7L2 andSLC30A8, weakened association with rs9300039and FTO, and had little effect on the other loci.The effect of waist circumference was similar tothat of BMI; blood pressure variables hadessentially no effect.
Fig. 2. Prediction of T2D risk in the FUSION samplewith the use of 10 T2D susceptibility variants. T2Dcases and NGT controls with complete genotype datawere included in the analysis. To obtain a sample witha T2D prevalence of ~10%, we included nine copiesof each of 2176NGT controls and one copy of each of2102 T2D cases. The predicted risk for each in-dividual was estimated from a logistic regressionmodel containing the 10 risk variants listed in Table1. The proportion of T2D cases is shown for 20 equalintervals of predicted T2D risk. We constructed 95%confidence intervals (CIs) for the proportion of T2Dcases in each interval using the original sample of2102 cases and 2176 controls. The constructed sam-ple T2D prevalence (0.096) is shown as a horizontalline. The proportion of T2D cases increases from~5% in the lowest to 20% in the highest predictedrisk categories.
47
We previously carried out T2D linkage anal-ysis in the families of many of our stage 1 cases(10). None of the 10 loci in Table 1 had largeT2D logarithm of the odds (LOD) scores,although those for FTO and TCF7L2 were 0.63and 0.60 and so were nominally significant.LOD scores for six of the 10 loci were greaterthan 0.2, as compared to 2.2 that would beexpected for random genome locations. Thissuggests enrichment for T2D-associated loci inregions with modest evidence of T2D linkage(P = 0.01) but that the power of the linkageapproach was insufficient to distinguish thesesignals from background noise.
The ability to construct a list of ten robustand replicated T2D-associated loci (Table 1)represents a landmark in efforts to identify ge-netic variants that predispose to complex humandiseases, although the specific predisposing var-iants and even the relevant genes remain to bedefined.We examined the combined risk of T2Dbased on these 10 loci in our stage 1 + 2 sampleby constructing a logistic regression model andpredicting T2D risk for each person (8). We founda fourfold variation in T2D risk from the lowest tohighest predicted risk groups, which is of potentialinterest for a personalized preventive-medicineprogram (Fig. 2). However, these predictions fromour datamay be biased as compared to predictionsbased on the general population, likely owing tothe overestimation of ORs due to the “winner’scurse,” enrichment for familial T2D cases, andexclusion of individuals with impaired glucosetolerance or impaired fasting glucose.
Thirty years ago, James V. Neel labeled T2Das “the geneticist’s nightmare” (32), predictingthat the discovery of genetic factors in T2Dwould be thoroughly challenging. Until recently,his prediction has proven true. Although largesamples and collaboration among three groupswere required, we can confidently state that newdiabetes risk factors have been identified. Eachgene discovery points to a pathway that contrib-utes to pathogenesis, and all of these proteins andtheir relevant pathways represent potential drugtargets for the prevention or treatment of diabetes.Based on the number of other interesting resultsobserved in these studies, it is likely that thereare additional T2D-predisposing loci to be found.Even though much remains to be done, we are atlast awakening from Jim Neel’s nightmare.
References and Notes1. S. Wild, G. Roglic, A. Green, R. Sicree, H. King, Diabetes
Care 27, 1047 (2004).2. S. S. Rich, Diabetes 39, 1315 (1990).3. J. Kaprio et al., Diabetologia 35, 1060 (1992).4. Diabetes Genetics Initiative, Science 316, 1331 (2007);
published online 26 April 2007 (10.1126/science.1142358).5. E. Zeggini et al., Science 316, 1336 (2007); published
online 26 April 2007 (10.1126/science.1142364).6. The Wellcome Trust Case Control Consortium, Nature,
in press.7. R. Sladek et al., Nature 445, 881 (2007).8. Materials and methods are available as supporting
material on Science Online.9. T. Valle et al., Diabetes Care 21, 949 (1998).
10. K. Silander et al., Diabetes 53, 821 (2004).
11. T. Saaristo et al., Diabetes Vasc. Dis. Res. 2, 67 (2005).12. B. Devlin, K. Roeder, Biometrics 55, 997 (1999).13. Y. Li, P. Scheet, J. Ding, G. R. Abecasis, submitted for
publication; manuscript available from G.R.A. (e-mail:[email protected]).
14. International HapMap Consortium, Nature 437, 1299 (2005).15. S. F. Grant et al., Nat. Genet. 38, 320 (2006).16. S. S. Deeb et al., Nat. Genet. 20, 284 (1998).17. D. Altshuler et al., Nat. Genet. 26, 76 (2000).18. A. L. Gloyn et al., Diabetes 52, 568 (2003).19. J. Nielsen et al., Mol. Cell. Biol. 19, 1262 (1999).20. F. Chimienti, S. Devergnas, A. Favier, M. Seve, Diabetes
53, 2330 (2004).21. F. Chimienti et al., J. Cell Sci. 119, 4199 (2006).22. M. F. Dunn, Biometals 18, 295 (2005).23. Y. P. Ching, A. S. Pang, W. H. Lam, R. Z. Qi, J. H. Wang,
J. Biol. Chem. 277, 15237 (2002).24. M. Ubeda, D. M. Kemp, J. F. Habener, Endocrinology
145, 3023 (2004).25. F. Y. Wei et al., Nat. Med. 11, 1104 (2005).26. M. Ubeda, J. M. Rukstalis, J. F. Habener, J. Biol. Chem.
281, 28858 (2006).27. S. G. Rane et al., Nat. Genet. 22, 44 (1999).28. T. Tsutsui et al., Mol. Cell. Biol. 19, 7011 (1999).29. W. Y. Kim, N. E. Sharpless, Cell 127, 265 (2006).30. R. Bort, J. P. Martinez-Barbera, R. S. Beddington,
K. S. Zaret, Development 131, 797 (2004).31. T. M. Frayling et al., Science 316, 889 (2007); published
online 12 April 2007 (10.1126/science.1141634).32. J. V. Neel, in The Genetics of Diabetes Mellitus,
W. Creutzfeldt, J. Köbberling, J. V. Neel, Eds. (Springer,Berlin, 1976), pp. 1–11.
33. We thank the Finnish citizens who generouslyparticipated in this study; our colleagues from the DGI,WTCCC, and UKT2D for sharing prepublication data fromtheir studies; S. Enloe of FUSION and E. Kwasnik,J. Gearhart, J. Romm, M. Zilka, C. Ongaco, A. Robinson,R. King, B. Craig, and E. Hsu of CIDR for expert technicalwork; and D. Leja of NHGRI for expert assistance with afigure. Support for this research was provided by NIHgrants DK062370 (M.B.), DK072193 (K.L.M.), HL084729(G.R.A.), HG002651 (G.R.A.), and U54 DA021519;National Human Genome Research Institute intramuralproject number 1 Z01 HG000024 (F.S.C.); a postdoctoralfellowship award from the American Diabetes Association(C.J.W.); a Wenner-Gren Fellowship (L.P.O.); and a CalvinResearch Fellowship (R.P.). Genome-wide genotyping wasperformed by the Johns Hopkins University GeneticResources Core Facility (GRCF) SNP Center at CIDR withsupport from CIDR NIH (contract N01-HG-65403) and theGRCF SNP Center.
Supporting Online Materialwww.sciencemag.org/cgi/content/full/1142382/DC1Author ContributionsMaterials and MethodsFigs. S1 to S3Tables S1 to S7References
12 March 2007; accepted 20 April 2007Published online 26 April 2007;10.1126/science.1142382Include this information when citing this paper.
Complex I Binding by a VirallyEncoded RNA RegulatesMitochondria-Induced Cell DeathMatthew B. Reeves,1* Andrew A. Davies,1 Brian P. McSharry,2Gavin W. Wilkinson,2 John H. Sinclair1†
Human cytomegalovirus infection perturbs multiple cellular processes that could promote therelease of proapoptotic stimuli. Consequently, it encodes mechanisms to prevent cell death duringinfection. Using rotenone, a potent inhibitor of the mitochondrial enzyme complex I (reducednicotinamide adenine dinucleotide– ubiquinone oxido-reductase), we found that humancytomegalovirus infection protected cells from rotenone-induced apoptosis, a protection mediatedby a 2.7-kilobase virally encoded RNA (b2.7). During infection, b2.7 RNA interacted with complex Iand prevented the relocalization of the essential subunit genes associated with retinoid/interferon–induced mortality–19, in response to apoptotic stimuli. This interaction, which is important forstabilizing the mitochondrial membrane potential, resulted in continued adenosine triphosphateproduction, which is critical for the successful completion of the virus’ life cycle. Complex Itargeting by a viral RNA represents a refined strategy to modulate the metabolic viability of theinfected host cell.
During primary infection or reactivation ofhuman cytomegalovirus (HCMV), espe-cially in the immunocompromised, the
virus is able to replicate in a number of cell types,often resulting in life-threatening disease (1).HCMVexhibits a relatively protracted life cycle(upwards of 5 days) and at early times of in-fection (12 to 24 hours) encodes a highly abun-dant 2.7-kb RNA transcript (b2.7), accountingfor >20% of total viral gene transcription (2, 3)of unknown function. The RNA may be asso-ciated with mitochondria (4), and no proteinproduct of this RNA has ever been detected in
infected cells (3), suggesting that it functions as anoncoding RNA (5).
We investigated the possibility that b2.7could function as a noncoding RNA. A
1Department of Medicine, University of Cambridge,Addenbrooke’s Hospital, Hills Road, Cambridge, CB22QQ, UK. 2Section for Infection and Immunity, Collegeof Medicine, University of Wales, Heath Park, Cardiff, CF144XX, UK.
*Present address: Novartis Institutes for Biomedical Research,500 Technology Square, Cambridge, MA 02139, USA.†To whom correspondence should be addressed. E-mail:[email protected]
48
Supporting Online Material for
A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants
Laura J. Scott, Karen L. Mohlke, Lori L. Bonnycastle, Cristen J. Willer, Yun Li, William L. Duren, Michael R. Erdos, Heather M. Stringham, Peter S. Chines, Anne U. Jackson, Ludmila Prokunina-Olsson, Chia-Jen Ding, Amy J. Swift, Narisu Narisu, Tianle Hu, Randall Pruim, Rui Xiao, Xiao-Yi Li, Karen N. Conneely, Nancy L. Riebow, Andrew G. Sprau, Maurine Tong, Peggy P. White, Kurt N. Hetrick, Michael W. Barnhart, Craig W. Bark, Janet L. Goldstein, Lee Watkins, Fang Xiang, Jouko Saramies, Thomas A. Buchanan, Richard M. Watanabe, Timo T. Valle, Leena Kinnunen, Gonçalo R. Abecasis, Elizabeth W. Pugh, Kimberly F. Doheny, Richard N. Bergman, Jaakko Tuomilehto, Francis S. Collins,* Michael Boehnke*
*To whom correspondence should be addressed. E-mail: [email protected] (M.B.); [email protected] (F.S.C.)
Published 26 April 2007 on Science Express
49
Methods
Sample description
Stage 1: In the results reported here, we analyzed 1,161 T2D cases and 1,174 NGT controls
from the Finland-United States Investigation of NIDDM Genetics (FUSION) (1, 2) and Finrisk
2002 (3) studies as our stage 1 sample (Tables S1, S2A). T2D was defined according to 1999
World Health Organization (WHO) criteria (4) of fasting plasma glucose concentration 7.0
mmol/l or 2-h plasma glucose concentration 11.1 mmol/l, by report of diabetes medication use,
or based on medical record review. FUSION cases with known or probable type 1 diabetes
among their first degree relatives were excluded. Normal glucose tolerance (NGT) was defined
as having fasting glucose < 6.1 mmol/l and 2-h glucose < 7.8 mmol/l (4). The 789 FUSION
cases each reported at least one T2D sibling; the 372 Finrisk 2002 T2D cases came from a
Finnish population-based risk factor survey. Controls included 219 subjects from Vantaa,
Finland who were NGT at ages 65 and 70 years, 304 NGT spouses of FUSION subjects, and 651
Finrisk 2002 NGT subjects. The stage 1 controls were approximately frequency-matched to the
stage 1 cases by five-year age category, sex, and birth province. We refer to these FUSION and
Finrisk 2002 cases and controls in the text as the FUSION stage 1 sample. For quantitative trait
and quality control analyses, we genotyped 122 FUSION offspring, yielding 119 mother-father-
offspring trios, 1 mother-father-two-offspring quartet, and one parent-offspring pair. For quality
control, we successfully genotyped 79 duplicate samples and five CEU HapMap parent-child
trios.
Stage 2: 1,215 Finnish T2D cases and 1,258 Finnish NGT controls were selected for stage 2
from the Dehko 2D (D2D) (5), Health 2000 (6), Finrisk 1987 (7), Finrisk 2002 (3), Savitaipale
50
Diabetes (8), and Action LADA (9) studies (Tables S1, S2B) and classified according to WHO
1999 criteria (4). The D2D, Health 2000, Finrisk 1987, and Savitaipale Diabetes studies are
population-based surveys; Action LADA is a study of latent autoimmune diabetes in adults
(LADA) in recently-diagnosed diabetes patients. We chose T2D cases from Action LADA who
were GAD antibody negative and therefore unlikely to have LADA. For all studies except
Action LADA, NGT controls were approximately frequency-matched within each study to the
T2D cases by five-year age category, sex, and birth province. Action LADA cases were
approximately frequency-matched in the same way with additional controls from the other
studies. Our stage 2 sample consists of 327 cases and 399 controls from D2D, 127 cases and 224
controls from Health 2000, 266 cases and 397 controls from Finrisk 1987, 52 controls from
Finrisk 2002, 122 cases and 186 controls from Savitaipale, and 373 cases from Action LADA
(Table S2B). For quality control in stage 2, we successfully genotyped 56 duplicate samples.
Informed consent: Informed consent was obtained from each study participant, and the study
protocol was approved by the ethics committee or institutional review board in each of the
participating centers.
Genotyping
GWA genotyping: Stage 1 and quality control samples were genotyped on Illumina Infinium™
II HumanHap300 BeadChips v.1.0 in the Johns Hopkins University Genetic Resources Core
Facility (GRCF) SNP Center at the Center for Inherited Disease Research (CIDR) using the
Illumina Infinium II assay protocol (10). An in-house LIMS was used for sample and reagent
51
tracking and lab workflow control (11). ~1 g of genomic DNA (15 μL at 70 ng/ l) was used as
input for the Infinium II assay.
Intensity data for each sample were normalized using BeadStudio v.2.3.25 and, for quality
control within CIDR, genotypes were determined using the Illumina-provided standard definition
cluster-file for the HumanHap300 v.1.0 product. These cluster boundaries were determined by
Illumina using 111 unique HapMap samples: 47 CEU, 36 YRI, and 28 CHB/JPT. BeadStudio
sample sheets were generated from our in-house LIMS. Sample and batch level quality control
was done by monitoring sample call rates, sex, heterozygote frequencies, and lab workflow
related variables using data generated from BeadStudio and our LIMS. 35 genotyped samples
fell below our sample call rate threshold of < 97.5% and were repeated; 28 of the repeated
samples gave call rates > 97.5%. The remaining 7 samples were excluded from analyses.
To obtain genotypes for analysis, we re-clustered the genotype data using cluster boundaries
determined with our own data. We removed samples for 15 people identified as likely first or
second degree relatives of other sampled individuals based on their genotype data (12). We
checked for consistency in genotyping within each of 79 duplicate sample pairs, with Mendelian
inheritance among the 122 parent-offspring sets, and with Hardy-Weinberg Equilibrium (HWE)
using the unrelated individuals (13). After initial analyses, we manually reviewed in BeadStudio
the clustering of the genotype data for our most strongly associated SNPs.
SNPs were dropped from all analyses if the HWE p-value was < 10-6, the total number of
Mendelian inconsistencies and duplicate pair discrepancies was > 3, or the SNP call rate was <
52
90%; and flagged for further attention if the HWE p-value was < 10-4, the total number of
Mendelian inconsistencies and duplicate pair discrepancies was > 1, or the SNP call rate < 95%.
All genotypes were oriented to the forward strand. There is little risk of strand ambiguities as
there are no C/G or A/T polymorphisms included in the Illumina 300K HumanHap panel.
For the 315,635 SNPs that passed our quality control criteria, the genotype consistency rate
among 79 duplicate sample pairs was 99.996%, the Mendelian consistency rate in 122 parent-
child sets was 99.967%, and the concordance rate for 15 samples genotyped both in our study
and by the HapMap consortium was 99.82%. 80.8% of SNPs had call frequency of 100%, and
99.68% of SNPs had call frequencies > 95%.
Confirmation and replication genotyping: We carried out focused, lower-throughput genotyping
with the Sequenom Homogeneous MassEXTEND or iPLEX Gold SBE assays at the National
Human Genome Research Institute (NHGRI). For 26 GWA SNPs re-genotyped in the stage 1
samples on a different genotyping platform (Sequenom), we observed a genotype consistency
rate of 99.92%; these included the SNPs with the strongest evidence of T2D association. We
also genotyped SNPs in the FUSION stage 2 samples or in the combined FUSION stage 1+2
samples to follow up interesting results based on (a) FUSION genotyped and imputed SNPs; (b)
the FUSION-DGI-WTCCC GWA results comparison; and (c) prior T2D association results in
our own or other studies. 80 of the 82 attempted SNPs had genotype call frequency > 94% and
HWE p-value > .001. The genotype consistency rate among duplicate samples was 99.9% and
the average call frequency was 97.1%.
53
Statistical analysis
T2D association: We tested for T2D-SNP association using logistic regression under the
additive genetic model that is multiplicative on the OR scale with adjustment for five-year age
category, sex, and birthplace. This test is the logistic regression equivalent to the Cochran-
Armitage test for trend (14) and is hence robust to departures from Hardy-Weinberg equilibrium.
We repeated some analyses including BMI, waist, systolic blood pressure, or diastolic blood
pressure as an additional covariate to assess the impact of these variables on evidence for SNP-
T2D association. For X-chromosome markers, we treated hemizygous males as homozygotes,
consistent with X inactivation for most of the chromosome. We presented and followed up on
results based on this additive model for ease of comparison between groups. We also analyzed
SNPs using recessive and dominant models; no SNP reached genome-wide significance in
FUSION stage 1 data, although additional T2D-prediposing variants may be among the SNPs
identified by these models.
To evaluate empirically the distribution of p-values observed in our GWA stage 1 study, we
permuted case/control status and re-ran the entire GWA analysis 100 times. We counted the
number of p-values < 10-5 or < 10-4 within each permuted dataset and found our study to fall
within the permuted distribution.
Statistical significance: Following the recommendation of the International HapMap
Consortium based on analysis of the ENCODE data, we declared a T2D-SNP association
“genome-wide significant” if the nominal p-value for the SNP was < 5 x 10-8 (15). In so doing,
54
we dealt with the multiple comparisons problem suggested by carrying out the equivalent of ~1
million tests.
Sample size calculation: For each SNP in Table 1, we calculated the sample size necessary to
detect T2D-SNP association at significance level .05 and power 80% under an additive model.
We converted the FUSION-DGI-WTCCC/UKT2D all-data OR to a risk ratio assuming T2D
prevalence 10%, and used this risk ratio and FUSION stage 1+2 control risk allele frequency as
the population allele frequency in the sample size calculation (16).
Imputation: We applied a computationally efficient hidden Markov model based algorithm (17,
18) to impute genotypes in FUSION samples for 2.25 million autosomal SNPs genotyped by the
International HapMap Consortium (15), but not present on the Illumina HumanHap300
BeadChip. The method combines our FUSION Illumina GWA genotype data with phased
chromosomes for the HapMap CEU samples and then infers the unknown FUSION genotypes
probabilistically by searching for similar stretches of flanking haplotype in the HapMap CEU
reference sample. In this process, we used the genotype data from the 290,690 FUSION
Illumina GWA autosomal SNPs which passed our quality control criteria and had minor allele
frequency > 5%. For each individual at each imputed SNP, we calculated an average allele
dosage score based on 90 iterations of the imputation algorithm. We assessed the quality of the
results for each SNP by calculating (a) the proportion of iterations that agreed with the most
likely genotype (imputation consistency) and (b) the ratio of the observed variance of dosage
scores across samples to the expected variance given the imputed allele frequency of the SNP
55
(estimated r2). 2.15 million of the HapMap autosomal SNPs had minor allele frequency > 1% in
the CEU sample; of these, 2.09 million met our quality control criterion of an estimated r2 > .30.
We evaluated the accuracy of our imputation procedure by comparing imputed genotypes to
actual genotypes for 510 SNPs not present on the Illumina GWA panel but that we had
previously genotyped in 1,190 individuals in our stage 1 samples (19). The average concordance
rate between imputed and actual alleles (genotypes) was 98.5% (97.1%), suggesting that the
HapMap CEU sample provides an appropriate basis for SNP genotype imputation in Finns,
consistent with our previous findings that allele frequencies, haplotype frequencies, and linkage
disequilibrium (LD) measures are remarkably similar between the CEU samples and a set of the
Finnish individuals that overlaps with those included in this study (19). We also genotyped 23
SNPs imputed in our stage 1 data; 16 of these SNPs had stage 1 imputation-based p-values < 10-
5. For most of these SNPs, the p-values for the actual genotypes were very similar to those for
the imputed genotypes, although often slightly less significant (Table S6); large differences
occurred most often for estimated r2 values nearer the quality control threshold. Differences
reflect variability in the imputation-based p-value estimates and our choice to follow up strong
imputation-based association results, an example of the “winner’s curse.” This variability in p-
value estimates for imputed SNPs did not lead to an increased overall false positive rate for the
study since we have chosen to genotype each such SNP in stage 1 as well as stage 2.
To test for disease-SNP association for imputed SNPs allowing for the effects of covariates, we
used logistic regression models in which the SNP effect was represented by its mean imputed
56
allele dosage score, an approach that takes into account the degree of uncertainty of genotype
imputation (18).
Combined analysis: We used a fixed effects model to estimate the combined ORs, 95%
confidence intervals (CIs), and p-values for the GWA genotype or imputed data for FUSION and
the GWA genotype data from DGI and WTCCC studies (20). We used the same approach to
combine all available data from the FUSION, DGI, and WTCCC/UKT2D studies. All results are
based on genotypes predicted from the forward strand of the genome sequence. When we
describe results across studies for non-identical SNPs, we report LD estimates based on FUSION
genotype data when available and on imputed data when not.
SNP selection for stage 2 genotyping: We selected SNPs for genotyping in the FUSION stage 2
samples based on the results of the FUSION GWA and the comparison of the FUSION, DGI,
and WTCCC GWA results. To enrich for SNPs with interesting biological functions from the
FUSION GWA, we weighted the association p-value according to our interest in the SNP based
on genome annotation, using an algorithm similar to the one described by Roeder et al. (21), with
weights as described in Table S7. Our algorithm advantaged genotyped SNPs that tagged any
HapMap SNP annotated as non-synonymous, frameshift, or critical splice site variants, or
located in or around interesting T2D candidate genes using an LD threshold of r2 .8 in the CEU
HapMap sample. It did so by dividing the p-value by the product of the maximal relevant
weighting factor and the relevant bonus factors. For imputed SNPs, we assigned the weight
based only on the imputed SNP itself. From SNPs with weighted p-values 10-4, we formed
sets of SNPs within 100 kb of each other and ranked these sets based on the smallest weighted p-
57
value. From each of these sets, we selected a strongly associated SNP for stage 2 genotyping,
giving some preference to genotyped over imputed SNPs to reduce stage 1 genotyping
requirements and to focus on SNPs for which we had more accurate genotype information. If an
imputed SNP was chosen, we genotyped stage 1 and 2 samples.
Risk prediction: We predicted T2D risk in the FUSION sample based on the ten identified T2D
susceptibility variants listed in Table 1. T2D cases and NGT controls with complete genotype
data were included in the analysis. To obtain a sample with ~10% T2D prevalence, the 2,176
NGT controls were included nine times each and the 2,102 T2D cases once each in a logistic
regression analysis. Figure 2 displays the proportion of T2D individuals for twenty equal
intervals of predicted T2D risk. 95% CIs for the proportion of T2D cases were constructed using
the original, not the expanded, sample.
Linkage and association: To assess the possible predictive value of T2D linkage for T2D
association, we counted the number of our ten T2D-associated loci (Table 1) for which the T2D
linkage LOD score was > 0.2 in our FUSION affected sibling pair families (2). We then divided
the genome into 5 cM bins and noted that 22% of such bins had T2D LOD score > 0.2 in our
T2D linkage scan. The observed count of six of the ten loci with T2D LOD > 0.2 is ~3-times
greater than expected by chance, and has exact binomial p-value of .01, consistent with the
hypothesis that very modest linkage evidence is somewhat predictive of the presence of a locus
detectable by association methods.
Gene expression analysis
58
RNAs from human tissues were purchased from Clontech and represented pooled samples from
several individuals. Purified human pancreatic islets were obtained from Islet Cell Resource
Centers (IRB Exemption number 3072) and the National Disease Research Interchange (IRB
Exemption number 3269) with approval by the National Institutes of Health Office of Human
Subjects Research. Anonymous human blood donor samples from the NIH Clinical Center
Division of Transfusion Medicine were provided as buffy coat isolations from whole blood
centrifugation. Human adipocytes were purchased from Cambrex as differentiated cultures, and
cell cultures -- 293T (human embryonic kidney), HeLa (human cervical carcinoma), and HepG2
(human hepatocellular carcinoma) -- were purchased from ATCC (the American Type Culture
Collection). Lymphoblastoid cell lines from CEPH individuals were purchased from the Coriell
Cell Repositories. RNA from cell cultures, islets, blood, and adipocytes was prepared with
Trizol Reagent (Invitrogen) followed by RNeasy Kit (Qiagen). RNA from four individual
samples was used to prepare pooled cDNA for islets, adipocytes, blood, and lymphoblasts.
cDNA was prepared from 1 ug of total RNA, using SuperScript III reverse transcriptase and
random hexamers (Invitrogen). cDNA equivalent to 25-50 ng of total RNA was used for each
quantitative PCR. All PCRs were performed in 10 ul volume in replicates of 3 or 4 using the
7900 Real-Time PCR System (ABI) in 384 well plates; average values were used for
calculations. The PCR with 2xSYBR Green PCR mix (Qiagen) and specific primers was
designed over exon boundaries to amplify only from cDNA:
CDKAL1_f: GAAGAATCTTTTGATTCCAAGTTTT
CDKAL1_r: GCAGCACCATTCTGGAACTC
CDKN2A_f: ATCTATGCGGGCATGGTTACT
59
CDKN2A_r: CAACGCACCGAATAGTTACG
CDKN2B_f: CGGGGACTAGTGGAGAAGGT
CDKN2B_r: ACCAGCGTGTCCAGGAAG
PCRs were carried out for 15 min at 95 C, followed by 40 cycles of 15 sec at 95 C, 15 sec at 59
C, and 45 sec at 72 C. Post-PCR melting curve analysis was used after each run. Gel-purified
PCR fragments were also sequenced to ensure the specificity of amplification and splicing. An
expression assay for human beta-2 microglobulin (B2M) Hs00187842_m1 was purchased from
ABI and used according to the instructions. Ct values (cycle at threshold) were determined from
real-time PCR. The expression of target genes was normalized to expression of B2M according
to the equation dCt = Ct B2M - Ct target, compared to expression in pancreas by equation ddCt =
dCt tissue - dCt pancreas, then converted to fold difference as fold difference = 2 ddCt (ABI, User
Bulletin #2 on relative quantification). We were unable to assess confidently the tissue
distribution of IGF2BP2 mRNA because of very high similarity (> 95%) to three processed
pseudogenes on chromosomes 1, 8, and 12.
60
Supplementary Figure Legends
Figure S1. Quantile-quantile plot for T2D association -log10 p-values for FUSION stage 1
samples and p-values expected under the null distribution for FUSION GWA SNPs.
Figure S2. Plot of T2D association and LD in FUSION stage 1 sample for region surrounding
SLC30A8. The top panel contains RefSeq genes. The second panel shows the T2D association
-log10 p-values in FUSION stage 1 samples for SNPs genotyped in the GWA panel (•) or
imputed (o). The third panel shows T2D association -log10 p-values for each SNP in a logistic
regression model correcting for the reference SNP rs13266634 (•, red dot). A decrease in the -
log10 p-value from the second to the third panel indicates that the association signal of the tested
SNPs can be explained, at least in part, by the reference SNP. The reference SNP is a non-
synonymous coding SNP, and was chosen because of its potential of being the actual functional
variant responsible for the association signal; choice of another strongly associated SNP nearby
would have resulted in a similar picture. The fourth panel shows recombination rate in cM per
Mb for the HapMap CEU sample (15). The fifth and sixth panels show linkage disequilibrium r2
and D' based on FUSION stage 1 genotyped and imputed data.
Figure S3. Expression of CDKAL1 (first panel), CDKN2A (second panel), and CDKN2B (third
panel) in human tissues and cells. The level of expression of each gene was determined by
quantitative RT-PCR, and normalized to the beta-2-microglobulin (B2M) housekeeping gene.
The data are presented as fold difference relative to expression in pancreas, which is set at 1.0.
61
293T cells are human embryonic kidney, HeLa are human cervical carcinoma, and HepG2 are
human hepatocellular carcinoma.
62
Figure S1
63
Figure S2
Chromosome 8
position (kb)118100 118150 118200 118250 118300 118350
D' i
n FU
SIO
Nr2 in
FU
SIO
N
0
100
cM/M
b
012345
−−lo
g 10((p
adj))
●●●
●●●● ● ●
●●●
●●●● ● ●● ● ●●
●●
●●●●●
●
●● ●●●
●●●● ● ●● ●●
●●●
●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●● ●●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●● ●
●●●●●●●●●●●●●●●●●●
●●
● ●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
012345
−−lo
g 10((p
))
●●
●
●
●●● ●●
●
●
●
●
●
●● ●
●
●● ●●●
●●●
●● ●
●●●
●●
●
●●●● ● ●●
●● ●●
●
●● ●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
●●
●●●
●●●●●
●●●● ●●
●
●●●●●●●●●●●●
●●●●●●●●●●
●
●●●
●●●●●●●●●●●●●●●●●●●●● ●
●●●●●
●●●
●●●●
●
●
●●●●
●
●
● ●●●●●●●●●●●●●●●
●●
●
●
●●
●
●
●●●●●●●
●
●
●●●
●
●●●●
●●●●●●●●
●●●
●●●●
●
●●
●
●●
●
●●
●●●
●●●●
●
●●●●●
●
●
●●●●●●●●●●●●●●●●●
●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
SLC30A8 −>
rs13266634
64
Figure S3
65
Table S1. Characteristics of stage 1 and stage 2 case and control samples Stage 1 Stage 2 Cases Controls Cases Controls Median IQR Median IQR Median IQR Median IQR
N 1161 1174 1215 1258
Male 653 574 724 768
Female 508 600 491 490
Age of Diagnosis (years) 53.0 12.0 --- --- 56.0 12.0 --- ---
Study Age (years) 63.4 11.2 64.0 11.7 60.0 11.5 59.0 10.6
BMI (kg/m2) 29.8 6.1 26.8 5.0 30.1 6.7 26.4 4.9 Fasting Plasma Glucose (mmol/l) 8.4 3.9 5.4 0.7 7.2a 2.1a 5.4b 0.6b
an=204 and bn=583 values converted from whole blood to plasma glucose equivalent using prediction equation from the European Diabetes Epidemiology Group (22), of which bn=262 fasted < 8 hours
66
Tabl
e S2
A.
Det
aile
d ch
arac
teris
tics o
f sta
ge 1
cas
e an
d co
ntro
l sam
ples
FU
SIO
NFi
nris
k20
02
Cas
es
C
ontro
lsC
ontro
ls fr
om F
inris
k 20
02
Cas
esC
ontro
ls
Med
ian
IQR
Med
ian
IQR
Med
ian
IQR
Med
ian
IQR
Med
ian
IQR
N
789
5
23a
27
6
372
37
5
M
ale
429
194
163
224
217
Fe
mal
e
36
032
911
314
815
8A
ge o
f Dia
gnos
is (y
ears
)
51.0
11.0
---
---
---
---
59.0
12
.0--
---
-St
udy
Age
(yea
rs)
64.2
10
.1
69.6
7.
7 62
.0
9.0
61.0
12
.0
61.0
12
.0
BM
I (kg
/m2 )
29.3
6.2
27.3
5.5
26.5
4.5
30.7
6.0
26.6
4.4
Fast
ing
Plas
ma
Glu
cose
(mm
ol/l)
9.
6 4.
7 5.
1 0.
6 5.
6 0.
5 7.
3 1.
3 5.
6 0.
5 a C
ompr
ised
of 2
19 F
USI
ON
con
trols
from
Van
taa
who
wer
e N
GT
at a
ges 6
5 an
d 70
yea
rs, a
nd 3
04 N
GT
spou
ses o
f FU
SIO
N T
2D su
bjec
ts
Tabl
e S2
B.
Det
aile
d ch
arac
teris
tics o
f sta
ge 2
cas
e an
d co
ntro
l sam
ples
D2D
H
ealth
200
0 A
ctio
n LA
DA
Fi
nris
k 19
87
Savi
taip
ale
Dia
bete
s Stu
dy
Cas
es
Con
trols
Cas
es
Con
trols
Cas
es
Con
trols
Cas
es
Con
trols
Cas
es
Con
trols
M
edia
nIQ
RM
edia
nIQ
RM
edia
nIQ
RM
edia
nIQ
RM
edia
n IQ
R
Med
ian
IQR
M
edia
n IQ
R
Med
ian
IQR
Med
ian
IQR
Med
ian
IQR
N
327
31
4
127
12
4
373
40
2a
266
30
0
122
11
8
M
ale
184
17
6
67
66
235
25
9
171
20
2
67
65
Fe
mal
e 14
3
138
60
58
13
8
143
95
98
55
53
A
ge o
f D
iagn
osis
(y
ears
) 60
.0
13
.0--
---
-55
.013
.0--
---
-55
.010
.0--
---
-55
.013
.0--
---
-55
.111
.7--
---
-
Stud
y A
ge
(yea
rs)
64.0
11.4
64.3
12.0
61.0
15.0
59.0
12.0
60.2
10.8
58.0
9.0
58.0
11.0
57.0
12.0
57.9
13.4
57.0
13.0
BM
I (kg
/m2 )
29.9
7.1
26.4
4.9
30.3
5.4
26.5
5.6
30.3
6.9
26.3
4.7
30.5
6.1
26.7
4.8
28.3
7.1
25.4
4.5
Fast
ing
Plas
ma
Glu
cose
(m
mol
/l)
7.2
2.
05.
40.
57.
32.
05.
40.
57.
32.
45.
5b 0.
6b 6.
9c 3.
0c 5.
1cd
0.6cd
7.
2c 0.
9c 5.
6c 0.
4c
a 85 D
2D, 1
00 H
ealth
200
0, 5
2 Fi
nris
k 20
02, 9
7 Fi
nris
k 19
87, a
nd 6
8 Sa
vita
ipal
e D
iabe
tes S
tudy
con
trols
b n=
165
valu
es c
onve
rted
from
who
le b
lood
to p
lasm
a gl
ucos
e eq
uiva
lent
usi
ng p
redi
ctio
n eq
uatio
n fr
om th
e Eu
rope
an D
iabe
tes E
pide
mio
logy
Gro
up (2
2) ,
of w
hich
n=5
2 fa
sted
< 8
hou
rs
c all v
alue
s con
verte
d fr
om w
hole
blo
od to
pla
sma
gluc
ose
equi
vale
nt u
sing
pre
dict
ion
equa
tion
from
the
Euro
pean
Dia
bete
s Epi
dem
iolo
gy G
roup
(22)
d n=
210
fast
ed <
8 h
ours
67
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1. S
ets o
f SN
Ps, w
here
eac
h SN
P is
with
in 1
00kb
of t
he p
rece
ding
SN
P, a
re d
elim
ited
by li
nes.
SNP
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
G
enot
yped
in
Sta
ge 2
? rs
5279
12
CD
A1
20,6
79,5
89G
/A.6
70
.723
1.
304
1.14
1-1.
49
9.4
x 10
-5
rs38
2032
1 PI
NK
11
20,7
08,1
33G
/A.6
02.6
631.
291
1.14
2-1.
459
4.0
x 10
-5
rs60
7254
D
DO
ST, K
IF17
, PIN
K1
120
,726
,186
G/A
.601
.663
1.29
4 1.
145-
1.46
3 3.
4 x
10-5
rs58
9709
D
DO
ST, K
IF17
, PIN
K1
120
,729
,293
G/A
.601
.663
1.29
7 1.
147-
1.46
5 2.
9 x
10-5
rs64
0742
DD
OST
, KIF
17, P
INK
1 1
20,7
29,8
60A
/C.6
01.6
631.
297
1.14
7-1.
465
2.9
x 10
-5Y
esrs
6238
17
DD
OST
, KIF
17, P
INK
1 1
20,7
31,3
84G
/A.6
01.6
631.
297
1.14
7-1.
467
3.1
x 10
-5
rs67
4114
D
DO
ST, K
IF17
1
20,7
34,9
78G
/A.6
15.6
681.
321
1.15
1-1.
516
6.8
x 10
-5
rs63
0484
D
DO
ST, K
IF17
1
20,7
37,9
12G
/T.6
16.6
701.
332
1.15
9-1.
530
4.8
x 10
-5
rs12
1187
60
DD
OST
, KIF
17
120
,745
,110
T/C
.736
.767
1.70
8 1.
331-
2.19
1 2.
2 x
10-5
rs19
3239
71
`29,
732,
290
T/C
.168
.215
1.35
11.
164-
1.56
97.
1 x
10-5
rs66
0392
6 1
29,7
35,2
48A
/G.1
68.2
151.
352
1.16
4-1.
57
7.0
x 10
-5
rs96
6252
4 1
29,7
39,4
96G
/C.1
68.2
151.
351
1.16
4-1.
569
7.3
x 10
-5
rs91
5409
1
29,7
40,3
63T/
C.1
68.2
151.
351
1.16
4-1.
569
7.3
x 10
-5
rs92
8693
81
29,7
46,1
94T
/C.1
68.2
141.
345
1.15
9-1.
562
9.1
x 10
-5
rs96
5952
3 1
29,7
46,6
93A
/C.1
69.2
151.
344
1.15
7-1.
56
1.0
x 10
-4
rs27
1306
1
29,7
51,7
57G
/C.1
68.2
141.
344
1.15
7-1.
561
1.0
x 10
-4
rs17
3564
14
159
,031
,529
C/T
.548
.607
1.31
1 1.
158-
1.48
5 1.
7 x
10-5
8.
0 x
10-4
Yes
rs66
7605
9 1
59,0
41,7
77G
/A.5
48.6
061.
312
1.15
9-1.
485
1.7
x 10
-5
rs
1213
3457
1
59,0
42,7
84G
/A.5
48.6
061.
312
1.15
9-1.
485
1.7
x 10
-5
rs17
0259
78K
CN
A10
111
0,78
1,65
3G
/A.9
14.9
471.
705
1.34
7-2.
158
6.6
x 10
-6Y
esrs
1702
5982
K
CN
A10
1
110,
782,
336
T/C
.910
.943
1.69
9 1.
342-
2.15
1 7.
8 x
10-6
rs
2790
372
111
0,79
9,16
6C
/A.9
37.9
621.
750
1.32
0-2.
319
7.5
x 10
-5
rs27
9976
5 1
110,
800,
193
T/C
.937
.962
1.74
8 1.
317-
2.31
9 8.
5 x
10-5
rs16
2607
8 1
110,
801,
281
C/T
.937
.962
1.74
8 1.
316-
2.32
2 8.
9 x
10-5
rs16
2267
5 1
110,
801,
684
A/T
.937
.962
1.75
8 1.
321-
2.33
8 8.
3 x
10-5
rs16
2757
2 1
110,
801,
712
G/A
.938
.962
1.75
6 1.
319-
2.33
8 8.
9 x
10-5
rs25
0135
4SL
AM
F8,
VSI
G8
115
6,62
8,71
5G
/A.3
55.4
151.
274
1.12
9-.4
378.
1 x
10-5
rs
2501
350
SLAM
F8, V
SIG
8 1
15
6,63
0,07
7G
/C.3
79.4
371.
288
1.13
6-.4
59
7.0
x 10
-5
rs35
7973
2
3,29
2,09
4G
/A.9
42.9
611.
975
1.39
4-2.
798
9.3
x 10
-5
rs35
7971
2
3,
292,
963
G/C
.942
.961
1.97
7 1.
395-
2.80
2 9.
1 x
10-5
rs23
3854
5PL
B1
228
,711
,426
G/A
.202
.252
1.33
21.
157-
.534
6.3
x 10
-5
rs22
4943
4 SC
LY
2
238,
757,
753
C/G
.076
.110
1.49
7 1.
221-
.835
9.
1 x
10-5
rs13
9113
6 3
21
,136
,392
C/T
.838
.874
1.42
5 1.
195-
1.70
0 7.
5 x
10-5
rs11
9268
89
3
30,2
53,2
94G
/A.8
80.9
111.
537
1.24
3-1.
900
6.1
x 10
-5
rs14
3400
6
330
,268
,508
C/T
.904
.934
1.
586
1.26
8-1.
984
4.4
x 10
-5
rs13
0752
343
30,2
69,4
34C
/T.9
22.9
46
1.70
7 1.
311-
2.22
3 5.
8 x
10-5
rs10
4401
373
30,2
70,9
78G
/T.9
04.9
34
1.58
1 1.
266-
1.97
4 4.
4 x
10-5
rs98
7041
03
30,2
83,7
63C
/T.9
04.9
351.
579
1.26
7-1.
967
3.8
x 10
-5
rs13
0926
023
30,2
84,9
49G
/A.9
06.9
39
1.66
0 1.
324-
2.08
1 8.
2 x
10-6
rs
1495
586
330
,302
,792
G/A
.907
.940
1.
666
1.32
7-2.
091
8.2
x 10
-6
rs17
0813
523
30,3
07,8
51C
/A.9
10.9
42
1.69
8 1.
342-
2.14
8 7.
6 x
10-6
5.
5 x
10-6
Yes
rs98
4315
33
30,3
08,2
52G
/T.9
13.9
44
1.72
2 1.
351-
2.19
5 8.
4 x
10-6
rs11
7143
433
34,4
37,8
73T
/C.0
84.1
181.
472
1.21
0-1.
791
9.6
x 10
-5
68
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1 (c
ontin
ued)
SN
P
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
G
enot
yped
in
Sta
ge 2
? rs
7399
84PT
PRG
3
61
,975
,357
G/A
.729
.777
1.32
01.
150-
1.51
57.
2 x
10-5
rs12
4901
28
TMEM
108
3
134,
391,
491
A/C
.118
.162
1.46
5 1.
234-
1.73
9 1.
1 x
10-5
rs13
0721
06TM
EM
108
3
134,
425,
451
T/C
.118
.155
1.41
41.
188-
1.68
28.
7 x
10-5
Yes
rs10
5128
91
TMEM
108
313
4,43
1,55
7A
/T.1
18.1
561.
415
1.18
9-1.
684
8.3
x 10
-5
rs76
5074
1 TM
EM10
83
134,
432,
277
T/C
.118
.156
1.41
6 1.
189-
1.68
4 8.
2 x
10-5
rs76
1259
5 TM
EM10
83
134,
439,
991
T/C
.118
.156
1.41
8 1.
192-
1.68
8 7.
5 x
10-5
rs16
8401
61
TMEM
108
313
4,47
8,42
4A
/G.1
17.1
581.
444
1.21
3-1.
718
3.1
x 10
-5
rs17
2973
32
TMEM
108
313
4,48
0,78
2G
/C.1
21.1
621.
447
1.21
6-1.
723
2.9
x 10
-5
rs76
2511
0 TM
EM10
83
134,
494,
477
T/G
.117
.158
1.44
6 1.
215-
1.72
2 2.
9 x
10-5
rs10
5128
96
TMEM
108
313
4,49
9,45
7G
/C.1
17.1
581.
450
1.21
8-1.
726
2.7
x 10
-5
rs17
0837
3 TM
EM10
83
134,
502,
025
G/A
.117
.158
1.45
1 1.
219-
1.72
8 2.
5 x
10-5
rs11
9731
6 TM
EM10
83
134,
522,
283
G/A
.117
.158
1.45
5 1.
222-
1.73
4 2.
3 x
10-5
rs19
2002
1 TM
EM10
83
134,
554,
123
T/C
.118
.158
1.45
0 1.
216-
1.72
9 3.
1 x
10-5
rs82
3968
313
6,54
2,75
5C
/T.3
82.4
361.
274
1.13
1-1.
436
6.7
x 10
-5
rs46
8729
6 M
AP3K
133
18
6,59
5,00
2T/
C.2
25.2
761.
325
1.15
8-1.
516
3.9
x 10
-5
rs46
8729
9M
AP3
K13
3
186,
595,
361
A/G
.225
.276
1.32
51.
158-
1.51
54.
0 x
10-5
Yes
rs88
6374
SOR
CS2
47,
856,
440
T/C
.211
.270
1.38
51.
209-
1.58
72.
4 x
10-6
Yes
rs68
1529
2 AT
P8A1
4
42
,251
,192
A/G
.244
.291
1.30
8 1.
144-
1.49
6 7.
9 x
10-5
rs76
6582
4 AT
P8A1
4
42,2
52,4
81T/
G.2
44.2
911.
309
1.14
5-1.
496
7.8
x 10
-5
rs11
7265
81
ATP8
A1
442
,257
,935
C/T
.244
.291
1.30
9 1.
145-
1.49
7 7.
7 x
10-5
rs11
7225
56
ATP8
A1
442
,258
,828
T/C
.244
.291
1.30
9 1.
145-
1.49
7 7.
5 x
10-5
rs17
6303
57
ATP8
A1
442
,266
,042
A/T
.774
.821
1.34
6 1.
160-
1.56
2 8.
2 x
10-5
rs43
1723
8 AT
P8A1
4
42,2
67,1
05A
/G.7
74.8
211.
346
1.16
0-1.
562
8.1
x 10
-5
rs16
8543
59
ATP8
A1
442
,269
,100
C/G
.241
.290
1.31
3 1.
149-
1.50
1 5.
7 x
10-5
rs99
9437
2 AT
P8A1
4
42,2
69,1
38T/
C.2
51.3
011.
335
1.16
6-1.
527
2.5
x 10
-5
rs10
0344
39
ATP8
A1
442
,287
,090
C/T
.776
.826
1.37
4 1.
182-
1.59
8 3.
1 x
10-5
rs13
1392
19A
TP8A
14
42,2
94,2
31C
/A.7
79.8
271.
346
1.16
0-1.
561
7.8
x 10
-5Y
esrs
6812
080
ATP8
A1
442
,319
,554
G/A
.779
.828
1.34
9 1.
163-
1.56
5 7.
0 x
10-5
rs13
1160
32
ATP8
A1
442
,320
,518
G/T
.779
.828
1.34
9 1.
163-
1.56
5 7.
0 x
10-5
rs50
2252
1 EL
OVL
6 4
111,
486,
191
T/C
.858
.884
1.78
5 1.
349-
2.36
1 4.
1 x
10-5
rs10
3023
15
66,3
53,0
21G
/A.1
98.2
451.
330
1.15
2-1.
536
9.3
x 10
-5
rs10
4768
44
5
142,
096,
902
T/C
.014
.023
4.66
6 2.
212-
9.84
1 3.
5 x
10-5
rs96
1730
AR
HG
AP26
5
142,
114,
126
C/T
.014
.024
4.69
6 2.
254-
9.78
4 2.
4 x
10-5
rs13
4713
3 AR
HG
AP26
514
2,11
4,29
0C
/T.0
14.0
244.
745
2.27
5-9.
899
2.1
x 10
-5
rs96
8076
AR
HG
AP26
514
2,11
6,49
1G
/A.0
14.0
244.
787
2.29
3-9.
993
2.0
x 10
-5
rs77
1490
7 AR
HG
AP26
514
2,12
5,57
0G
/A.0
14.0
235.
319
2.47
3-11
.441
1.
2 x
10-5
rs77
3220
7 AR
HG
AP26
514
2,12
5,61
3A
/G.0
14.0
235.
317
2.47
2-11
.439
1.
2 x
10-5
rs76
4387
AR
HG
AP26
514
2,12
5,86
9T/
C.0
14.0
235.
326
2.47
2-11
.474
1.
2 x
10-5
rs77
3701
8 AR
HG
AP26
514
2,12
6,28
3C
/G.0
14.0
235.
317
2.46
2-11
.483
1.
3 x
10-5
rs68
9867
5 AR
HG
AP26
514
2,13
1,84
3T/
C.0
14.0
235.
320
2.45
6-11
.526
1.
4 x
10-5
rs68
9443
3 AR
HG
AP26
514
2,13
3,53
5C
/T.0
14.0
235.
315
2.45
2-11
.523
1.
4 x
10-5
rs70
7177
AR
HG
AP26
514
2,23
2,07
6A
/G.3
72.4
241.
308
1.14
6-1.
493
6.4
x 10
-5
rs44
7923
AR
HG
AP26
514
2,23
2,44
1T/
C.3
25.3
731.
321
1.14
8-1.
519
9.2
x 10
-5
rs26
707
ARH
GAP
265
142,
233,
857
G/C
.250
.303
1.32
5 1.
160-
1.51
3 3.
0 x
10-5
69
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1 (c
ontin
ued)
SN
P
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
G
enot
yped
in
Sta
ge 2
? rs
2670
6 AR
HG
AP26
5 14
2,23
7,04
4
C/G
.253
.3
06
1.32
4 1.
159-
1.51
3 3.
2 x
10-5
rs
2777
9A
RH
GA
P26
5
142,
239,
267
A/C
.250
.304
1.32
61.
162-
1.51
32.
5 x
10-5
Yes
rs27
546
ARH
GAP
265
142,
245,
929
T/A
.250
.302
1.32
1 1.
157-
1.50
8 3.
5 x
10-5
rs11
9703
89
TUBB
2B, L
OC
3893
62
63,
195,
655
T/C
.041
.063
1.84
5 1.
351-
2.51
8 9.
2 x
10-5
rs47
1399
2 6
36
,720
,183
A/G
.730
.764
1.52
5 1.
240-
1.87
5 5.
7 x
10-5
rs77
5044
5 ZF
AND
3 6
37,8
72,9
55G
/C.1
14.1
581.
483
1.24
4-1.
769
9.4
x 10
-6
4.1
x 10
-5Y
esrs
1723
5125
6
79
,437
,555
A/G
.871
.906
1.45
9 1.
207-
1.76
2 8.
0 x
10-5
rs17
2351
67
6
79,4
37,6
14C
/G.8
71.9
061.
459
1.20
8-1.
763
7.8
x 10
-5
rs17
2352
09
679
,437
,636
C/T
.871
.906
1.46
1 1.
209-
1.76
5 7.
6 x
10-5
rs17
8268
01
679
,437
,741
A/G
.871
.906
1.46
0 1.
208-
1.76
4 7.
8 x
10-5
rs20
2196
6 EN
PP1
613
2,19
2,13
2A
/G.5
85.6
341.
320
1.15
0-1.
516
7.2
x 10
-5
2.6
x 10
-4Y
esrs
2813
539
SYN
E1
6
152,
613,
828
G/A
.382
.435
1.31
2 1.
150-
1.49
6 4.
8 x
10-5
rs14
0846
0 SY
NE1
6
152,
614,
232
C/G
.460
.518
1.26
7 1.
126-
1.42
6 8.
3 x
10-5
rs71
9764
SY
NE1
6
152,
614,
487
C/G
.483
.538
1.29
3 1.
141-
1.46
6 5.
4 x
10-5
rs26
7377
6SY
NE
1 6
152,
614,
926
G/T
.458
.516
1.26
51.
125-
1.42
28.
0 x
10-5
rs26
3544
1 SY
NE1
6
152,
615,
257
A/G
.460
.517
1.26
4 1.
123-
1.42
2 9.
4 x
10-5
rs13
2120
52
616
6,26
4,60
1T/
C.9
79.9
922.
979
1.66
8-5.
323
8.2
x 10
-5
rs27
9130
0 7
18
,102
,317
C/G
.704
.752
1.31
9 1.
149-
1.51
4 7.
7 x
10-5
rs47
2170
8 7
18,1
43,5
42C
/T.7
02.7
601.
373
1.19
9-1.
572
3.8
x 10
-6
rs61
5545
718
,165
,111
C/T
.694
.751
1.36
11.
190-
1.55
65.
9 x
10-6
Yes
rs24
7098
4SL
C13
A1
712
2,36
8,68
0A
/C.2
97.3
481.
279
1.13
0-1.
448
9.0
x 10
-5Y
esrs
6466
855
SLC
13A1
7
122,
371,
141
A/G
.294
.346
1.28
9 1.
137-
1.46
2 7.
0 x
10-5
rs69
6427
2 SL
C13
A17
122,
373,
978
T/C
.265
.317
1.33
3 1.
168-
1.52
1.
7 x
10-5
rs13
4441
83
SLC
13A1
712
2,37
7,23
2G
/T.2
65.3
171.
333
1.16
8-1.
521
1.8
x 10
-5
rs69
6373
5 SL
C13
A17
122,
394,
634
C/T
.256
.306
1.35
0 1.
176-
1.54
9 1.
8 x
10-5
rs10
2804
30
SLC
13A1
712
2,39
9,30
6C
/T.2
55.3
051.
350
1.17
6-1.
549
1.9
x 10
-5
rs18
8017
8 SL
C13
A17
122,
403,
062
T/C
.255
.305
1.35
0 1.
176-
1.55
1.
9 x
10-5
rs10
9546
547
138,
816,
342
C/T
.725
.776
1.33
71.
166-
1.53
32.
8 x
10-5
Yes
rs10
2776
03
7
138,
816,
687
C/T
.592
.645
1.35
4 1.
179-
1.55
4 1.
5 x
10-5
rs10
2619
79
713
8,81
6,83
2G
/C.6
01.6
531.
367
1.18
7-1.
574
1.3
x 10
-5
rs10
2623
38
713
8,81
6,91
3A
/G.5
92.6
451.
355
1.18
0-1.
555
1.5
x 10
-5
rs96
9240
1 7
138,
817,
247
C/T
.584
.637
1.36
4 1.
187-
1.56
7 1.
1 x
10-5
rs96
9166
2 7
138,
817,
453
A/G
.592
.645
1.35
3 1.
179-
1.55
4 1.
6 x
10-5
rs96
9041
8 7
138,
817,
495
G/A
.592
.645
1.35
3 1.
179-
1.55
3 1.
6 x
10-5
rs12
7074
49
713
8,81
7,98
3A
/T.5
92.6
451.
353
1.17
9-1.
553
1.6
x 10
-5
rs10
2712
87
713
8,81
9,51
7T/
C.5
92.6
451.
353
1.17
9-1.
554
1.6
x 10
-5
rs38
732
MRP
S33
714
0,15
8,34
6T/
A.0
69.0
961.
680
1.29
6-2.
178
6.9
x 10
-5
rs92
74
MRP
S33
7
140,
159,
215
A/G
.048
.076
1.63
9 1.
279-
2.10
1 7.
5 x
10-5
rs54
4081
7
140,
209,
733
G/A
.048
.076
1.64
3 1.
282-
2.10
6 6.
7 x
10-5
rs48
8795
7
140,
211,
070
T/G
.048
.076
1.64
3 1.
282-
2.10
5 6.
8 x
10-5
rs51
2509
7
140,
211,
331
T/C
.048
.076
1.64
3 1.
282-
2.10
5 6.
7 x
10-5
rs54
8245
7
140,
212,
951
T/C
.047
.075
1.63
5 1.
274-
2.09
9 8.
9 x
10-5
rs47
1817
7
140,
214,
431
A/C
.048
.076
1.64
3 1.
282-
2.10
5 6.
8 x
10-5
rs80
1155
714
0,22
1,13
4A
/G.0
48.0
76
1.64
2 1.
282-
2.10
5 6.
8 x
10-5
70
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1 (c
ontin
ued)
SN
P
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
G
enot
yped
in
Sta
ge 2
? rs
5289
57LO
C64
2421
714
0,22
2,64
3
T/C
.048
.076
1.63
41.
276-
2.09
47.
8 x
10-5
rs55
7962
7
14
0,23
2,92
4T
/C.0
47.0
761.
650
1.28
7-2.
115
5.9
x 10
-5Y
esrs
7842
241
C8o
rf68
81,
056,
317
G/A
.634
.688
1.28
5 1.
134-
1.45
6 8.
1 x
10-5
rs97
9728
D
LC1
8
13,4
35,3
09T/
C.3
71.4
051.
464
1.20
9-1.
772
8.6
x 10
-5
rs18
5202
7C
NB
D1
8
88,0
76,2
30G
/A.5
52.6
111.
269
1.12
7-1.
428
7.6
x 10
-5
rs17
7077
46
PTD
SS1
8
97,3
84,8
21C
/A.0
41.0
651.
750
1.31
7-2.
326
8.7
x 10
-5
rs88
3655
PT
DSS
18
97,3
86,3
57C
/T.0
41.0
651.
751
1.31
7-2.
328
8.9
x 10
-5
rs13
4392
40
PTD
SS1
897
,387
,836
T/C
.041
.065
1.75
2 1.
317-
2.33
0 8.
9 x
10-5
rs78
3029
3 G
PR20
814
2,44
2,69
1C
/T.0
66.0
991.
597
1.27
6-1.
999
3.6
x 10
-5
rs65
7816
7 G
PR20
8
142,
450,
474
C/A
.065
.098
1.57
8 1.
264-
1.97
0 4.
7 x
10-5
rs78
3924
4G
PR20
814
2,45
7,43
7A
/G.0
66.0
981.
553
1.24
8-1.
932
6.8
x 10
-5Y
esrs
4961
268
GPR
208
142,
464,
393
G/A
.064
.097
1.58
6 1.
271-
1.98
0 3.
7 x
10-5
rs49
6175
5 BN
C2
916
,759
,812
C/G
.121
.158
1.46
7 1.
213-
1.77
4 7.
0 x
10-5
rs12
6831
58
NFI
L39
91
,266
,820
C/T
.927
.954
1.73
6 1.
333-
2.26
1 3.
2 x
10-5
rs13
2972
68
NFI
L39
91,2
67,6
96G
/A.9
27.9
541.
745
1.33
8-2.
277
3.0
x 10
-5
9.0
x 10
-5
Yes
rs13
2897
38
NFI
L39
91,2
71,7
01G
/T.9
26.9
511.
793
1.35
4-2.
372
3.3
x 10
-5
rs78
5634
8 C
YLC
2 9
102,
835,
550
C/A
.541
.591
1.30
8 1.
144-
1.49
5 7.
9 x
10-5
rs13
3014
6 9
10
7,63
1,79
4G
/A.5
45.6
031.
289
1.14
2-1.
455
3.7
x 10
-5
rs10
8165
76
9
107,
633,
222
G/A
.545
.603
1.28
9 1.
142-
1.45
5 3.
7 x
10-5
rs10
1211
93
910
7,66
0,60
1A
/G.3
82.4
261.
348
1.16
1-1.
565
8.4
x 10
-5
rs45
4387
7 10
65,1
72,0
27C
/G.4
39.4
971.
330
1.17
3-1.
507
7.7
x 10
-6
rs38
6479
9 10
65,1
72,3
88G
/C.4
39.4
971.
330
1.17
3-1.
508
7.5
x 10
-6
rs39
1216
5 10
65,1
87,6
97A
/G.4
27.4
851.
349
1.18
6-1.
534
4.5
x 10
-6
rs10
7401
40
1065
,189
,760
A/G
.428
.485
1.29
0 1.
145-
1.45
2 2.
5 x
10-5
rs47
4639
6 10
65,1
94,1
29C
/G.4
36.4
941.
274
1.13
6-1.
429
3.1
x 10
-5
rs16
9188
64
1065
,228
,767
G/C
.430
.487
1.27
5 1.
136-
1.43
1 3.
4 x
10-5
rs31
0405
6 10
71,1
80,0
45G
/A.9
74.9
863.
162
1.73
6-5.
758
6.3
x 10
-5
rs17
7473
24
TCF7
L2
10
11
4,74
2,49
3C
/T.1
41.1
811.
445
1.21
4-1.
719
3.0
x 10
-5
rs79
0314
6TC
F7L
210
114,
748,
339
T/C
.179
.229
1.38
81.
197-
1.61
01.
2 x
10-5
Yes
rs12
2433
26
TCF7
L2
1011
4,77
8,80
5C
/T.1
63.2
131.
429
1.22
4-1.
667
5.0
x 10
-6
rs12
2553
72TC
F7L
210
114,
798,
892
T/G
.156
.203
1.40
01.
201-
1.63
21.
5 x
10-5
Yes
rs12
2882
14
1141
,772
,225
G/A
.915
.946
1.68
1 1.
316-
2.14
7 2.
5 x
10-5
rs12
2848
61
1141
,787
,876
A/G
.915
.946
1.
685
1.32
0-2.
150
2.1
x 10
-5
rs11
0365
7711
41,7
92,4
60C
/T.9
14.9
46
1.68
4 1.
320-
2.14
8 2.
1 x
10-5
rs12
7974
3611
41,7
98,9
17A
/C.9
13.9
44
1.62
4 1.
279-
2.06
2 5.
4 x
10-5
rs12
2747
3211
41,8
05,5
01C
/T.9
14.9
46
1.68
2 1.
319-
2.14
5 2.
1 x
10-5
rs12
2759
2311
41,8
18,5
26A
/C.9
14.9
46
1.68
5 1.
321-
2.15
0 2.
0 x
10-5
rs12
2945
5211
41,8
21,0
81G
/C.9
13.9
44
1.62
9 1.
282-
2.06
9 5.
2 x
10-5
rs11
0366
0011
41,8
23,6
51A
/G.9
14.9
46
1.68
5 1.
321-
2.15
0 2.
0 x
10-5
rs11
6004
9511
41,8
28,6
09C
/A.9
14.9
44
1.62
2 1.
273-
2.06
5 7.
3 x
10-5
rs10
1604
4211
41,8
33,6
78T/
C.9
14.9
46
1.68
3 1.
318-
2.14
8 2.
2 x
10-5
rs37
6382
711
41,8
34,4
54G
/C.9
13.9
43
1.62
5 1.
278-
2.06
6 5.
9 x
10-5
rs64
8528
811
41,8
37,9
14A
/G.9
06.9
39
1.61
6 1.
285-
2.03
2 3.
2 x
10-5
rs12
2802
9411
41,8
38,3
23G
/T.9
14.9
45
1.68
3 1.
318-
2.15
0 2.
3 x
10-5
71
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1 (c
ontin
ued)
SN
P
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
G
enot
yped
in
Sta
ge 2
? rs
1228
1155
11
41,8
43,6
40C
/G.9
14
.945
1.
684
1.31
8-2.
151
2.3
x 10
-5
rs12
7866
34
1141
,845
,196
C/T
.914
.945
1.
683
1.31
8-2.
150
2.3
x 10
-5
rs12
2775
5711
41,8
49,1
52A
/T.9
12.9
43
1.68
6 1.
320-
2.15
5 2.
2 x
10-5
rs12
7937
9511
41,8
54,7
02G
/A.9
06.9
36
1.58
8 1.
258-
2.00
5 8.
4 x
10-5
rs12
2715
2511
41,8
58,4
37G
/A.8
91.9
25
1.51
2 1.
228-
1.86
0 8.
1 x
10-5
rs79
2820
011
41,8
59,1
09A
/G.8
91.9
25
1.51
2 1.
229-
1.86
1 8.
0 x
10-5
rs12
2733
4411
41,8
59,3
53G
/T.8
90.9
251.
516
1.23
3-1.
863
6.5
x 10
-5
rs12
7885
4811
41,8
62,9
57C
/T.8
91.9
25
1.51
3 1.
229-
1.86
2 7.
9 x
10-5
rs12
2887
3811
41,8
68,8
75T/
C.8
90.9
24
1.51
1 1.
229-
1.85
8 7.
5 x
10-5
rs15
8843
911
41,8
71,1
82G
/A.8
90.9
24
1.51
1 1.
229-
1.85
8 7.
5 x
10-5
rs16
9360
6711
41,8
71,8
20G
/T.9
06.9
36
1.58
0 1.
252-
1.99
3 9.
5 x
10-5
rs93
0003
911
41,8
71,9
42C
/A.8
90.9
251.
520
1.23
6-1.
869
6.0
x 10
-5Y
esrs
1103
6622
1141
,872
,742
C/T
.890
.924
1.
516
1.23
2-1.
864
6.9
x 10
-5
rs11
0366
2411
41,8
78,2
46T/
C.8
91.9
25
1.52
5 1.
236-
1.88
1 6.
8 x
10-5
rs12
7970
3811
41,8
80,4
53C
/T.9
07.9
37
1.59
8 1.
260-
2.02
6 9.
0 x
10-5
rs12
8042
1011
41,8
80,9
99T/
C.8
91.9
25
1.54
9 1.
251-
1.91
9 5.
1 x
10-5
rs11
0366
2711
41,8
81,2
90C
/A.9
04.9
37
1.66
2 1.
314-
2.10
3 1.
8 x
10-5
1.
9 x
10-5
Yes
rs11
0366
2811
41,8
81,3
52G
/A.9
04.9
37
1.66
2 1.
313-
2.10
3 1.
8 x
10-5
rs71
1424
111
41,8
82,1
03T/
C.8
91.9
25
1.55
2 1.
251-
1.92
4 5.
2 x
10-5
rs71
2874
311
41,8
82,2
75C
/A.8
91.9
25
1.55
2 1.
252-
1.92
5 5.
2 x
10-5
rs12
2883
6111
41,8
83,3
03C
/T.8
91.9
25
1.55
3 1.
252-
1.92
7 5.
1 x
10-5
rs12
8026
3411
41,8
86,1
38T/
C.8
91.9
25
1.55
4 1.
252-
1.92
8 5.
2 x
10-5
rs12
8028
6211
41,8
86,2
67T/
C.8
91.9
25
1.55
4 1.
252-
1.92
8 5.
2 x
10-5
rs11
6081
8911
41,8
87,3
87G
/T.9
07.9
37
1.60
9 1.
267-
2.04
5 7.
9 x
10-5
rs11
6020
0411
41,9
00,8
43G
/T.9
07.9
38
1.61
6 1.
271-
2.05
3 7.
0 x
10-5
rs11
6021
2711
41,9
01,5
57G
/A.9
07.9
38
1.62
8 1.
280-
2.07
0 5.
6 x
10-5
rs10
5012
8111
41,9
22,9
35C
/T.9
15.9
471.
617
1.27
6-2.
048
5.3
x 10
-5
rs11
8239
9211
41,9
26,8
56A
/T.9
18.9
49
1.65
1 1.
294-
2.10
5 4.
0 x
10-5
rs71
0180
911
41,9
33,7
15T/
C.9
18.9
49
1.65
3 1.
295-
2.10
9 4.
1 x
10-5
rs12
2870
5211
41,9
35,1
44A
/G.9
18.9
49
1.65
1 1.
289-
2.11
4 5.
6 x
10-5
rs11
0366
4211
41,9
40,9
97T/
A.9
21.9
51
1.69
9 1.
318-
2.19
1 3.
3 x
10-5
rs17
5534
0811
41,9
51,9
28T/
G.9
18.9
49
1.65
0 1.
288-
2.11
5 5.
8 x
10-5
rs12
2934
0811
41,9
56,3
32C
/T.9
21.9
51
1.69
5 1.
315-
2.18
6 3.
5 x
10-5
rs16
9362
0011
41,9
63,3
15A
/C.9
06.9
39
1.63
5 1.
294-
2.06
7 3.
0 x
10-5
rs11
0366
4911
41,9
65,5
24A
/G.9
06.9
39
1.63
4 1.
293-
2.06
6 3.
1 x
10-5
rs12
5764
0811
41,9
71,2
03G
/T.9
06.9
39
1.63
3 1.
292-
2.06
4 3.
2 x
10-5
rs11
0366
5211
41,9
71,2
69T/
C.9
07.9
39
1.62
9 1.
288-
2.05
8 3.
5 x
10-5
rs71
0724
611
41,9
72,4
28C
/A.8
83.9
15
1.63
0 1.
287-
2.06
4 4.
0 x
10-5
rs11
6049
6611
41,9
72,7
36T/
C.9
07.9
40
1.62
3 1.
285-
2.05
1 3.
8 x
10-5
rs10
8377
6611
41,9
84,3
77T/
C.8
40.8
82
1.47
2 1.
232-
1.75
9 1.
8 x
10-5
8.
6 x
10-5
Yes
rs17
5540
0511
41,9
89,1
48A
/C.9
16.9
47
1.68
6 1.
312-
2.16
6 3.
4 x
10-5
rs17
5540
5411
41,9
90,2
18T/
C.9
16.9
47
1.68
2 1.
310-
2.16
1 3.
6 x
10-5
rs17
5540
8111
41,9
90,2
80A
/G.9
16.9
46
1.67
7 1.
306-
2.15
4 3.
9 x
10-5
rs28
6245
611
41,9
90,7
69C
/T.9
16.9
46
1.66
8 1.
300-
2.14
0 4.
5 x
10-5
rs17
4629
5211
41,9
91,7
95A
/G.9
16.9
46
1.66
6 1.
299-
2.13
7 4.
6 x
10-5
72
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1 (c
ontin
ued)
SN
P
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
G
enot
yped
in
Sta
ge 2
? rs
1746
2994
11
41
,991
,889
T/C
.9
16
.946
1.
666
1.29
9-2.
137
4.6
x 10
-5
rs12
7929
32
1112
7,22
6,77
2G
/A.9
67.9
84
2.30
3 1.
515-
3.50
0 5.
2 x
10-5
rs12
8068
5911
127,
234,
379
T/G
.967
.984
2.
299
1.51
4-3.
492
5.2
x 10
-5
rs12
7990
3211
127,
328,
409
G/A
.963
.980
2.
197
1.46
9-3.
287
8.3
x 10
-5
rs12
7927
4911
127,
336,
192
G/A
.963
.980
2.
191
1.46
5-3.
275
8.6
x 10
-5
rs12
7976
3111
127,
341,
608
T/G
.963
.980
2.
191
1.46
5-3.
278
8.7
x 10
-5
rs12
7969
0011
127,
341,
924
C/A
.963
.980
2.
191
1.46
5-3.
276
8.8
x 10
-5
rs12
7939
0111
127,
345,
185
G/A
.963
.980
2.
198
1.46
8-3.
290
8.6
x 10
-5
rs11
6161
88
LTBR
, SC
NN
1A
126,
373,
003
A/G
.474
.522
1.40
0 1.
201-
1.63
3 1.
6 x
10-5
4.
8 x
10-5
Yes
rs73
1353
3 12
6,38
6,11
6A
/G.7
02.7
421.
394
1.17
9-1.
649
9.8
x 10
-5
rs
1258
1386
C
ORO
1C12
107,
585,
465
C/A
.962
.977
2.54
6 1.
571-
4.12
6 7.
6 x
10-5
rs38
2525
3C
OR
O1C
12
10
7,61
1,74
7A
/G.9
73.9
892.
575
1.60
4-4.
134
3.6
x 10
-5Y
esrs
7957
463
FLJ2
0674
, WSB
2 12
116,
981,
026
T/C
.577
.633
1.27
4 1.
134-
1.43
2 4.
2 x
10-5
rs79
5811
0 FL
J206
74, W
SB2
12
116,
981,
479
T/C
.577
.633
1.27
3 1.
133-
1.43
0 4.
4 x
10-5
rs47
6765
8F
LJ20
674,
WSB
2 12
116,
982,
161
T/C
.577
.633
1.27
41.
134-
1.43
04.
1 x
10-5
Yes
rs74
8830
9 FL
J206
74, W
SB2
1211
6,98
2,89
0G
/A.5
77.6
331.
273
1.13
3-1.
430
4.3
x 10
-5
rs27
1174
7 C
CD
C60
1211
8,36
0,95
3T/
G.0
14.0
253.
401
1.84
2-6.
280
4.9
x 10
-5
rs19
1841
6
12
118,
463,
133
C/T
.808
.853
1.38
31.
181-
1.61
84.
9 x
10-5
rs80
4628
1211
8,46
8,45
8G
/C.8
16.8
56
1.43
2 1.
204-
1.70
2 4.
4 x
10-5
rs26
6916
112
120,
663,
139
C/G
.846
.884
1.
457
1.21
0-1.
755
6.3
x 10
-5
rs27
0706
9
1212
0,66
6,80
4C
/T.8
46.8
84
1.46
2 1.
212-
1.76
4 6.
4 x
10-5
rs12
8752
713
80,7
31,2
74T/
C.0
85.1
20
1.49
3 1.
226-
1.81
9 6.
1 x
10-5
rs12
8752
6
1380
,734
,028
G/A
.088
.123
1.48
01.
219-
1.79
66.
4 x
10-5
rs98
2864
1380
,735
,627
C/T
.075
.109
1.
512
1.22
9-1.
859
7.7
x 10
-5
rs28
0159
713
80,7
36,0
45G
/A.0
75.1
09
1.51
2 1.
229-
1.85
9 7.
8 x
10-5
rs12
8753
313
80,7
40,6
50A
/T.0
83.1
17
1.49
0 1.
220-
1.82
0 8.
2 x
10-5
rs95
4585
113
81,2
34,8
88T/
C.5
25.5
83
1.27
9 1.
135-
1.44
1 5.
1 x
10-5
rs95
4585
2
1381
,237
,495
C/T
.525
.583
1.
278
1.13
4-1.
440
5.2
x 10
-5
rs95
3124
613
81,2
39,5
73C
/A.5
25.5
83
1.27
8 1.
134-
1.43
9 5.
3 x
10-5
rs95
4585
313
81,2
42,5
79T/
C.5
26.5
83
1.27
7 1.
134-
1.43
8 5.
4 x
10-5
rs11
1492
1413
81,2
83,6
09C
/A.5
26.5
83
1.27
6 1.
133-
1.43
8 5.
5 x
10-5
rs95
4587
013
81,2
86,2
74A
/G.5
26.5
83
1.27
6 1.
133-
1.43
8 5.
5 x
10-5
rs38
9159
113
81,2
91,9
69C
/T.5
17.5
73
1.27
6 1.
131-
1.44
0 6.
9 x
10-5
rs95
4590
313
81,3
44,9
14T
/C.4
59.5
141.
270
1.12
8-1.
430
7.2
x 10
-5
rs10
1351
9714
38,1
23,4
11T/
C.5
98.6
54
1.28
8 1.
138-
1.45
8 6.
1 x
10-5
rs80
1419
8
1438
,132
,529
G/A
.616
.670
1.
291
1.13
7-1.
464
7.0
x 10
-5
rs97
8849
014
38,1
32,6
89C
/G.6
03.6
59
1.28
7 1.
138-
1.45
5 5.
5 x
10-5
rs11
8491
7414
38,1
47,1
49G
/A.6
03.6
601.
287
1.13
8-1.
455
5.4
x 10
-5
rs10
1454
9314
38,1
51,1
39G
/A.6
03.6
59
1.28
7 1.
138-
1.45
5 5.
6 x
10-5
rs12
4354
3814
38,1
54,1
95T/
C.5
53.6
12
1.31
8 1.
161-
1.49
5 1.
7 x
10-5
rs13
4924
114
38,1
55,1
89T/
C.5
53.6
12
1.31
8 1.
161-
1.49
5 1.
8 x
10-5
rs10
1419
5714
38,1
57,0
20G
/A.5
49.6
10
1.32
3 1.
167-
1.50
0 1.
1 x
10-5
rs21
2233
114
38,1
63,3
58G
/C.5
14.5
75
1.27
5 1.
133-
1.43
5 5.
2 x
10-5
rs80
1048
914
38,1
63,6
18G
/A.5
23.5
84
1.28
1 1.
137-
1.44
4 4.
5 x
10-5
73
Tabl
e S3
. FU
SIO
N st
age
1 T2
D a
ssoc
iatio
n: g
enot
yped
(bol
d) a
nd im
pute
d (n
on-b
old)
SN
Ps w
ith p
-val
ue <
.000
1 (c
ontin
ued)
SN
P
Gen
es
C
hr
Po
sitio
n (b
p)
FUSI
ON
ris
k al
lele
/ no
n-ris
k al
lele
C
ontro
l ris
k fr
eque
ncy
C
ase
risk
freq
uenc
y
O
R
95
% C
I
p-
valu
e
Gen
otyp
ed
p-va
lue
for i
mpu
ted
SNP
Gen
otyp
ed
in S
tage
2?
rs14
4972
0
14
38,1
65,3
18A
/G.5
12.5
731.
269
1.12
8-1.
428
6.8
x 10
-5
rs12
1648
74
1438
,172
,603
C/T
.515
.577
1.
278
1.13
6-1.
439
4.5
x 10
-5
rs10
1383
4214
38,1
86,1
08A
/C.5
26.5
87
1.28
4 1.
139-
1.44
8 4.
0 x
10-5
rs71
5369
914
38,1
88,8
07C
/T.5
18.5
79
1.27
9 1.
136-
1.44
0 4.
4 x
10-5
rs65
7186
514
38,1
91,4
21T/
C.5
18.5
80
1.28
1 1.
137-
1.44
2 4.
1 x
10-5
rs71
4169
614
38,1
92,1
26T/
C.5
18.5
80
1.28
1 1.
138-
1.44
3 4.
0 x
10-5
rs80
0647
414
38,1
96,2
48G
/C.5
27.5
89
1.29
0 1.
144-
1.45
4 3.
1 x
10-5
rs21
2233
314
38,2
33,1
19C
/T.5
42.6
10
1.32
1 1.
171-
1.49
1 5.
3 x
10-6
rs14
4972
514
38,2
46,5
72C
/T.5
43.6
10
1.32
2 1.
172-
1.49
2 4.
9 x
10-6
1.
1 x
10-5
Yes
rs28
9988
314
38,2
55,6
04G
/T.5
39.6
04
1.32
0 1.
169-
1.49
1 7.
0 x
10-6
rs23
1939
2 G
PHN
1466
,136
,844
T/A
.014
.023
4.39
6 2.
050-
9.42
6 5.
0 x
10-5
rs38
2556
9LO
C38
8015
14
10
0,42
0,05
1C
/T.5
83.6
401.
292
1.14
3-1.
463.
7 x
10-5
rs12
9108
27
15
56,4
17,3
11T/
G.0
24.0
472.
592
1.73
8-3.
866
1.3
x 10
-6
6.3
x 10
-6Y
esrs
1163
4708
LO
C56
964,
PEX
11A,
PLI
N
15
88
,037
,214
C/T
.433
.485
1.31
5 1.
153-
1.50
0 4.
1 x
10-5
rs10
5210
9516
13
,528
,936
A/G
.206
.256
1.35
11.
174-
1.55
42.
3 x
10-5
Yes
rs64
9842
3 16
13,5
31,3
81A
/G.2
06.2
561.
351
1.17
4-1.
555
2.4
x 10
-5
rs12
1620
88
1613
,547
,393
G/A
.130
.169
1.40
7 1.
185-
1.67
1 8.
8 x
10-5
rs16
9622
70
1613
,547
,426
T/A
.130
.169
1.40
9 1.
186-
1.67
3 8.
7 x
10-5
rs20
3325
4 C
ETP
1655
,567
,486
T/C
.646
.693
1.36
7 1.
177-
1.58
7 4.
0 x
10-5
rs12
7089
80
CET
P16
55,5
69,8
80T/
G.6
33.6
771.
385
1.18
4-1.
621
4.4
x 10
-5
rs18
0077
4 C
ETP
1655
,573
,046
C/T
.640
.686
1.39
9 1.
195-
1.63
9 2.
8 x
10-5
7.
3 x
10-6
Yes
rs11
6461
14
FOXC
2, M
THFS
D
1685
,141
,275
T/A
.868
.894
1.65
8 1.
285-
2.14
0 8.
9 x
10-5
0.00
2Y
esrs
9911
259
PRK
CA
17
62
,085
,377
C/A
.435
.493
1.27
4 1.
134-
1.43
2 4.
4 x
10-5
rs16
9598
80
PRK
CA
17
62,0
85,5
28A
/G.4
35.4
931.
274
1.13
4-1.
432
4.3
x 10
-5
rs80
7711
0 PR
KC
A17
62,0
87,0
49A
/G.4
35.4
931.
274
1.13
4-1.
432
4.3
x 10
-5
rs10
2474
0 PR
KC
A17
62,0
88,1
52C
/G.4
35.4
931.
275
1.13
4-1.
432
4.3
x 10
-5
rs72
0734
5PR
KC
A17
62,0
93,7
47T
/C.7
07.7
551.
307
1.14
4-1.
492
7.5
x 10
-5
rs17
3840
05
181,
565,
020
A/G
.810
.839
1.86
4 1.
409-
2.46
7 1.
1 x
10-5
.10
Yes
rs17
8571
0 18
21,6
12,8
25G
/C.6
48.7
021.
295
1.14
2-1.
468
5.1
x 10
-5
rs72
2965
4 18
35,5
49,9
84A
/G.9
59.9
782.
024
1.41
2-2.
902
8.0
x 10
-5
rs15
9658
3 18
35,5
50,8
93G
/A.9
59.9
792.
033
1.41
8-2.
916
7.3
x 10
-5
rs96
7599
5 18
35,5
74,9
07G
/A.9
59.9
782.
020
1.41
0-2.
895
8.3
x 10
-5
rs10
8534
67
1835
,582
,328
A/G
.959
.978
2.02
1 1.
410-
2.89
6 8.
2 x
10-5
rs61
6444
SETB
P1
1840
,739
,522
A/C
.882
.917
1.46
51.
208-
1.77
89.
0 x
10-5
rs17
5200
22
18
,543
,063
A/G
.494
.555
1.28
2 1.
138-
1.44
5 4.
1 x
10-5
5.
5 x
10-5
Yes
rs43
8798
22
18,5
44,0
53G
/A.4
94.5
551.
282
1.13
8-1.
444
4.2
x 10
-5
rs
5206
98
LOC
1502
0722
19,3
49,4
34G
/A.7
02.7
571.
377
1.19
9-1.
582
5.4
x 10
-6
rs56
5979
22
19,3
53,5
00C
/T.6
79.7
301.
295
1.13
9-1.
472
7.0
x 10
-5Y
esrs
4792
75
2219
,353
,777
T/A
.656
.708
1.28
3 1.
131-
1.45
5 9.
5 x
10-5
rs49
1228
D
KFZ
p434
N03
522
19,3
57,9
25G
/A.6
79.7
301.
294
1.13
8-1.
471
7.5
x 10
-5
rs59
1446
D
KFZ
p434
N03
522
19,3
59,2
04A
/G.6
56.7
081.
283
1.13
1-1.
454
9.7
x 10
-5
rs22
6733
9 C
ACN
G2
2235
,290
,742
G/T
.610
.666
1.33
3 1.
169-
1.52
1 1.
6 x
10-5
4.
5 x
10-6
Yes
74
Tabl
e S4
. Con
firm
ed T
2D su
scep
tibili
ty lo
ci:
expa
nded
FU
SIO
N re
sults
sk
Ri
alle
le R
/ R
isk
alle
le
Non
-ris
k
Con
trols
(n)
Cas
es (n
)
fr
eque
ncy
Add
itive
D
omin
ant
Rec
essi
ve
SNP
Gen
eSt
age
alle
le N
RR
RN
NN
RR
RN
NN
cont
rol
case
OR
95%
CI
p-va
lue
OR
95%
CI
p-va
lue
OR
95%
CI
p-va
lue
rs
1801
282
PPAR
G
1
s5
219
KC
NJ1
11
TC
221
562
346
271
538
296
.445
.4
89
1.20
41.
069-
1.35
7 .0
022
1.21
41.
007-
1.46
3.0
42
1.36
61.
114-
1.67
5.0
027
2
T/C
284
622
328
271
624
295
.482
.4
90
1.03
50.
922-
1.16
2 .5
6 1.
112
0.92
5-1.
338
.26
0.97
90.
807-
1.18
6.8
3
1+2
T/C
505
1184
67
4 54
2 11
6259
1 .4
64
.489
1.
109
1.02
1-1.
204
.014
1.
152
1.01
1-1.
312
.034
1.
142
0.99
4-1.
312
.060
rs
9300
039
1
C/A
929
232
13
992
161
7 .8
90
.925
1.
520
1.23
6-1.
869
6.0
x 10
-51.
797
0.70
2-4.
600
.21
1.56
31.
254-
1.94
86.
2 x
10-5
2 C
/A
98
8 22
7 17
10
0717
0 5
.894
.9
24
1.44
21.
179-
1.76
4 3.
2 x
10-4
3.44
51.
247-
9.52
0.0
094
1.42
71.
150-
1.77
1.0
012
1+
2 C
/A
19
17
459
30
1999
331
12
.892
.9
24
1.47
81.
280-
1.70
5 6.
8 x
10-8
2.47
01.
252-
4.87
4.0
062
1.49
01.
279-
1.73
72.
7 x
10-7
rs80
5013
6 FT
O1
A/C
192
562
420
213
538
410
.403
.4
15
1.03
40.
920-
1.16
2 .5
8 0.
999
0.84
1-1.
186
.99
1.12
40.
904-
1.39
7.2
9
2 A
/C
15
0 58
5 49
2 18
5 56
6 42
7 .3
61
.397
1.
179
1.04
6-1.
329
.007
0 1.
179
0.99
8-1.
394
.053
1.
363
1.07
7-1.
725
.009
8
1+2
A/C
342
1147
91
2 39
8 11
0483
7 .3
81
.406
1.
107
1.01
9-1.
203
.017
1.
091
0.96
9-1.
229
.15
1.24
01.
058-
1.45
3.0
078
C/G
778
336
4583
429
819
.816
.854
1.30
31.
111-
1.52
9.0
011
2.39
91.
387-
4.15
1.0
011
1.27
01.
059-
1.52
3.0
097
2
C/G
840
337
3883
829
337
.830
.843
1.07
70.
924-
1.25
60.
340.
975
0.61
2-1.
555
.92
1.11
00.
929-
1.32
7.2
5
1+2
C/G
16
18
67
383
1672
591
56.8
23.8
481.
195
1.07
1-1.
333
.0
014
1.49
41.
056-
2.11
4
.022
1.20
01.
058-
1.36
2
.004
6 rs
4402
960
IGF2
BP2
1T/
G10
247
158
514
849
549
8.2
91.3
471.
276
1.12
6-1.
446
1.2
x 10
-4
1.31
61.
115-
1.55
5.0
012
1.52
01.
160-
1.99
2.0
022
2
T/G
142
498
595
122
553
515
.317
.335
1.07
30.
951-
1.21
1.2
51.
197
1.01
8-1.
408
.029
0.87
20.
672-
1.13
1.3
0
1+2
T/G
244
96
9 11
8027
0 10
4810
13.3
04
.341
1.
175
1.07
8-1.
281
2.
4 x
10-4
1.
263
1.12
5-1.
418
7.
3 x
10-5
1.
155
0.96
0-1.
390
.1
3 rs
7754
840
CD
KAL
11
C/G
154
522
439
190
531
400
.372
.406
1.15
51.
022-
1.30
4.0
21
1.16
50.
979-
1.38
7.0
84
1.28
81.
019-
1.62
8.0
34
2
C/G
141
574
509
153
565
466
.350
.368
1.08
30.
959-
1.22
3.2
01.
093
0.92
6-1.
290
.29
1.14
10.
890-
1.46
3.3
0
1+2
C/G
295
10
96
94
8 34
3 10
9686
6 .3
60.3
871.
120
1.02
8-1.
220
.0
095
1.12
91.
002-
1.27
1
.046
1.22
01.
030-
1.44
4
.021
rs13
2666
34
SLC
30A8
1
C/T
421
577
176
506
500
155
.604
.651
1.22
21.
084-
1.37
9.0
010
1.15
70.
913-
1.46
6.2
31.
380
1.16
6-1.
634
1.8
x 10
-4
2C
/T47
056
119
250
551
616
0.6
14.6
461.
143
1.01
6-1.
286
.026
1.19
90.
952-
1.51
1.1
21.
190
1.00
8-1.
406
.040
1+
2 C
/T
891
11
38
36
8 10
1110
1631
5 .6
09.6
491.
184
1.08
9-1.
287
6.
8 x
10-5
1.
175
0.99
7-1.
385
.0
53 1.
289
1.14
6-1.
449
2.
3x 1
0-5
rs
1081
1661
C
DK
N2A
/B1
T/C
809
308
1385
025
618
.852
.870
1.16
80.
980-
1.39
2.0
820.
763
0.36
9-1.
576
.46
1.22
31.
011-
1.48
0.0
38
2T/
C89
330
933
911
256
23.8
48.8
731.
223
1.03
9-1.
441
.015
1.34
50.
779-
2.32
2.2
81.
254
1.04
2-1.
510
.017
1+
2T/
C
1702
617
4617
6151
241
.850
.872
1.20
4 1.
069-
1.35
6
.002
2 1.
112
0.72
4-1.
708
.6
31.
245
1.09
1-1.
421
.0
01 rs
1111
875
HH
EX1
C/T
333
568
273
372
549
240
.526
.557
1.12
81.
006-
1.26
6.0
39
1.16
40.
954-
1.42
0.1
31.
187
0.99
2-1.
420
.061
2C
/T33
259
628
533
358
125
0.5
19.5
361.
058
0.94
3-1.
187
.34
1.12
60.
926-
1.36
9.2
31.
039
0.86
6-1.
246
.68
1+
2 C
/T
665
11
64
55
8 70
5 11
3049
0 .5
22.5
461.
097
1.01
2-1.
189
.0
251.
148
0.99
9-1.
318
.0
51
1.
120
0.98
6-1.
271
.0
81 rs
7903
146
TCF7
L21
T/C
3235
678
655
422
684
.179
.229
1.38
81.
197-
1.61
0 1.
3 x
10-5
1.42
21.
198-
1.68
85.
3 x
10-5
1.81
91.
161-
2.85
0.0
079
2
T/C
3338
381
068
393
711
.183
.226
1.29
51.
122-
1.49
5 3.
9 x
10-4
1.26
61.
069-
1.49
8.0
061
2.12
31.
382-
3.26
24.
1 x
10-4
1+2
T/C
65
739
1596
123
815
1395
.181
.227
1.34
31.
213-
1.48
8
1.4
x 10
-81.
344
1.19
2-1.
514
1.
2 x
10-6
1.
993
1.46
4-2.
712
7.
1 x
10-6
r
/
75
Tabl
e S5
. FU
SIO
N st
age
1, st
age2
, and
stag
e 1
+ 2
T2D
ass
ocia
tion
resu
lts fo
r 80
SNPs
. SN
Ps w
ere
sele
cted
for s
tage
1 o
r sta
ge 2
gen
otyp
ing
base
d on
resu
lts in
the
FUSI
ON
GW
A, c
ombi
ned
evid
ence
from
FU
SIO
N, D
GI,
and
WTC
CC
GW
As,
or p
revi
ous r
epor
ts.
Stag
e 1
Stag
e 2
Stag
e 1
+ 2
Ris
k C
ontro
lC
ase
Con
trol
Cas
e C
ontro
lC
ase
alle
le/
risk
risk
risk
risk
risk
risk
Po
sitio
n
no
n-ris
k al
lele
al
lele
al
lele
al
lele
al
lele
al
lele
St
age
1 St
age
2 St
age
1 +
2
Rea
son
for
SNP
Chr
(b
p)
G
enes
al
lele
fr
eq
freq
fr
eq
freq
fr
eq
freq
O
R
95%
CI
p-va
lue
OR
95
% C
I p-
valu
e O
R
95%
CI
p-va
lue
fo
llow
-up
rs64
0742
1
20,7
29,8
60
C
DA,
DD
OST
, K
IF17
, PIN
K1
A/C
.6
01
.663
.6
16
.613
.6
09
.638
1.
297
1.14
7-1.
465
2.9
x 10
-5
0.99
20.
884-
1.11
2.8
9 1.
127
1.03
7-1.
225
.004
7
FUSI
ON
GW
A
rs17
3564
14
1 59
,031
,529
-C
/T
.694
.7
36
.719
.7
08
.707
.7
22
1.24
81.
096-
1.42
2 8.
0 x
10-4
0.
953
0.84
1-1.
081
.46
1.08
40.
991-
1.18
6.0
77
FU
SIO
N Im
pute
d rs
1702
5978
1
110,
781,
653
K
CN
A10
G/A
.9
14
.947
.9
34
.930
.9
24
.939
1.
705
1.34
7-2.
158
6.6
x 10
-6
0.94
10.
752-
1.17
8.6
0 1.
270
1.08
2-1.
491
.003
3
FUSI
ON
GW
A
rs10
4942
17
1 11
9,18
1,23
0
TBX1
5 G
/T
.708
.7
35
.740
.7
25
.724
.7
30
1.14
21.
004-
1.29
8 .0
44
0.92
90.
816-
1.05
8.2
7 1.
026
0.93
7-1.
124
.58
C
ombi
ned
GW
A
rs75
9978
1 2
43,5
90,3
77
PL
EKH
H2,
TH
ADA
T/C
.9
42
.958
.9
54
.950
.9
48
.954
1.
478
1.11
9-1.
953
.005
6 0.
895
0.68
3-1.
172
.42
1.14
70.
947-
1.39
0.1
6
Com
bine
d G
WA
rs
6704
803
2 15
8,17
5,05
9
ACVR
1C, P
SCD
BP
C/T
.9
28
.946
.9
38
.942
.9
33
.944
1.
316
1.03
3-1.
675
.025
1.
084
0.85
1-1.
380
.52
1.19
81.
011-
1.41
9.0
36
C
ombi
ned
GW
A
rs18
0128
2 3
12,3
68,1
25
PP
ARG
, LO
C64
3925
C
/G
.816
.8
54
.830
.8
43
.823
.8
48
1.30
31.
111-
1.52
9 .0
011
1.07
70.
924-
1.25
6.3
4 1.
195
1.07
1-1.
333
.001
4
Com
bine
d G
WA
rs
1708
1352
3
30,3
07,8
51
-
C/A
.9
05
.940
.9
28
.927
.9
17
.933
1.
680
1.33
9-2.
109
5.5
x 10
-6
0.97
80.
780-
1.22
4.8
4 1.
276
1.09
0-1.
494
.002
3
FUSI
ON
Impu
ted
rs13
0721
06
3 13
4,42
5,45
1
BFSP
2, T
MEM
108
T/C
.1
18
.155
.1
43
.142
.1
30
.149
1.
414
1.18
8-1.
682
8.7
x 10
-5
1.00
00.
852-
1.17
4.1
0 1.
166
1.03
8-1.
311
.009
8
FUSI
ON
GW
A
rs46
8729
9 3
186,
595,
361
M
AP3K
13
A/G
.2
25
.276
.2
68
.260
.2
47
.268
1.
325
1.15
8-1.
515
3.9
x 10
-5
0.95
90.
841-
1.09
2.5
3 1.
116
1.01
7-1.
225
.020
FUSI
ON
GW
A
rs17
2899
25
3 18
6,91
7,36
2
C3o
rf65
, IG
F2BP
2,
LOC
6466
00
C/T
.0
18
.022
.0
20
.020
.0
19
.021
1.
181
0.77
5-1.
801
.44
1.07
70.
719-
1.61
3.7
2 1.
117
0.83
6-1.
492
.46
Fo
llow
-up
rs44
0296
0 3
186,
994,
389
IG
F2BP
2 T/
G
.291
.3
47
.317
.3
35
.304
.3
41
1.27
61.
126-
1.44
6 1.
2 x
10-4
1.
073
0.95
1-1.
211
.25
1.17
51.
078-
1.28
12.
4 x
10-4
Com
bine
d G
WA
rs
7343
12
4 6,
421,
426
W
FS1
A/G
.4
78
.506
.4
82
.485
.4
80
.496
1.
101
0.98
0-1.
236
.11
1.01
00.
899-
1.13
4.8
7 1.
056
0.97
3-1.
145
.19
C
ombi
ned
GW
A
rs88
6374
4
7,85
6,44
0
SORC
S2
T/C
.2
11
.270
.2
33
.221
.2
22
.245
1.
385
1.20
9-1.
587
2.4
x 10
-6
0.94
30.
824-
1.08
1.4
0 1.
140
1.03
6-1.
253
.007
FUSI
ON
GW
A
rs13
1392
19
4 42
,294
,231
ATP8
A1
C/A
.7
79
.827
.7
96
.805
.7
88
.816
1.
346
1.16
0-1.
561
7.9
x 10
-5
1.05
20.
911-
1.21
4.5
0 1.
186
1.07
0-1.
314
.001
1
FUSI
ON
GW
A
rs68
3424
8 4
95,4
47,4
56
LO
C64
4429
, PG
DS,
SM
ARC
AD1
T/C
.7
72
.786
.7
79
.765
.7
75
.776
1.
108
0.96
3-1.
275
.15
0.91
90.
800-
1.05
6.2
3 1.
001
0.90
7-1.
104
.99
C
ombi
ned
GW
A
rs27
2046
0 4
104,
412,
290
BD
H2,
CEN
PE,
DH
RS6,
LO
C13
3308
A
/G
.571
.6
07
.574
.5
79
.573
.5
93
1.15
41.
025-
1.29
9 .0
18
1.01
20.
899-
1.14
0.8
4 1.
084
0.99
8-1.
179
.057
Com
bine
d G
WA
rs27
779
5 14
2,23
9,26
7
ARH
GAP
26
A/C
.2
50
.304
.2
59
.269
.2
55
.286
1.
326
1.16
2-1.
513
2.5
x 10
-5
1.04
40.
917-
1.19
0.5
2 1.
171
1.06
8-1.
283
7.5
x 10
-4
FU
SIO
N G
WA
rs
3733
876
5 17
6,31
5,60
1
RAP8
0 G
/A
.765
.8
05
.791
.7
98
.778
.8
01
1.27
71.
109-
1.47
1 6.
6 x
10-4
1.
051
0.90
9-1.
215
.50
1.15
61.
046-
1.27
8.0
046
FU
SIO
N G
WA
rs
4712
523
6 20
,765
,543
CD
KAL
1G
/A
.372
.4
07
.349
.3
66
.360
.3
87
1.16
41.
032-
1.31
2 .0
13
1.08
40.
959-
1.22
4.2
0 1.
123
1.03
2-1.
222
.007
3
Follo
w-u
p rs
1094
6398
6
20,7
69,0
13
C
DK
AL1
C/A
.3
68
.404
.3
47
.364
.3
57
.384
1.
163
1.02
9-1.
315
.016
1.
081
0.95
6-1.
222
.22
1.12
21.
029-
1.22
3.0
087
C
ombi
ned
Impu
ted
rs77
5484
0 6
20,7
69,2
29
C
DK
AL1
C/G
.3
72
.406
.3
50
.368
.3
60
.387
1.
155
1.02
2-1.
304
.021
1.
083
0.95
9-1.
223
.20
1.12
01.
028-
1.22
0.0
095
Fo
llow
-up
rs22
0673
4 6
20,8
02,8
63
C
DK
AL1
T/C
.1
74
.200
.1
68
.174
.1
71
.187
1.
182
1.01
6-1.
375
.030
1.
060
0.91
1-1.
234
.45
1.11
61.
003-
1.24
1.0
43
C
ombi
ned
GW
A
rs44
9678
0 6
21,1
87,6
27
C
DK
AL1
G/T
.1
04
.093
.0
92
.106
.0
98
.100
0.
890
0.73
0-1.
086
.25
1.20
90.
994-
1.47
1.0
57
1.04
60.
911-
1.20
0.5
3
Follo
w-u
p rs
9271
366
6 32
,694
,832
HLA
DQ
A1,
HLA
DRA
, HLA
DRB
1 A
/G
.858
.8
62
.857
.8
67
.858
.8
64
1.04
40.
878-
1.24
1 .6
3 1.
104
0.93
6-1.
303
.24
1.06
70.
948-
1.20
2.2
8
Com
bine
d G
WA
rs11
7514
69
6 33
,912
,525
-C
/T
.563
.6
09
.574
.5
85
.568
.5
97
1.20
91.
073-
1.36
2 .0
018
1.05
00.
933-
1.18
2.4
1 1.
122
1.03
2-1.
219
.007
Com
bine
d G
WA
rs
7750
445
6 37
,872
,955
ZFAN
D3
G/C
.1
36
.180
.1
63
.135
.1
50
.157
1.
407
1.19
4-1.
659
4.2
x 10
-5
0.81
40.
694-
0.95
6.0
12
1.05
30.
941-
1.17
9.3
7
FUSI
ON
Impu
ted
rs94
7213
8 6
43,9
19,7
40
-
T/C
.3
10
.314
.3
05
.321
.3
08
.318
1.
031
0.91
1-1.
166
.63
1.07
10.
946-
1.21
2.2
8 1.
050
0.96
3-1.
145
.27
N
ew A
ssoc
rs
7450
789
6 11
1,92
3,66
8
LOC
6437
49, R
EV3L
, TR
AF3I
P2
T/G
.9
03
.919
.9
08
.912
.9
06
.916
1.
228
1.00
1-1.
506
.048
1.
069
0.87
7-1.
304
.51
1.14
10.
990-
1.31
4.0
68
C
ombi
ned
GW
A
rs20
2196
6 6
132,
192,
132
EN
PP1
A/G
.5
76
.630
.6
06
.621
.5
92
.626
1.
246
1.10
7-1.
403
2.6
x 10
-4
1.05
70.
939-
1.19
0.3
6 1.
148
1.05
6-1.
247
.001
2
FUSI
ON
Impu
ted
rs61
5545
7
18,1
65,1
11
-
C/T
.6
94
.751
.7
08
.733
.7
01
.742
1.
361
1.19
0-1.
556
5.9
x 10
-6
1.13
40.
998-
1.28
9.0
53
1.23
61.
127-
1.35
56.
1 x
10-6
FUSI
ON
GW
A
rs10
2813
05
7 54
,664
,618
-G
/T
.735
.7
72
.738
.7
57
.737
.7
65
1.22
41.
069-
1.40
1 .0
033
1.10
10.
961-
1.26
1.1
6 1.
153
1.04
8-1.
268
0.00
33
C
ombi
ned
GW
A
rs17
1586
86
7 83
,439
,407
SEM
A3A
T/G
.9
51
.957
.9
59
.958
.9
55
.958
1.
156
0.87
4-1.
528
.31
1.00
70.
751-
1.35
1.9
6 1.
077
0.88
1-1.
316
.47
C
ombi
ned
GW
A
rs24
7098
4 7
122,
368,
680
SL
C13A
1 A
/C
.297
.3
48
.316
.2
98
.307
.3
23
1.27
91.
130-
1.44
8 9.
0 x
10-5
0.
930
0.82
2-1.
054
.26
1.08
30.
993-
1.18
1.0
73
FU
SIO
N G
WA
rs
1095
4654
7
138,
816,
342
-
C/T
.7
25
.776
.7
35
.749
.7
30
.762
1.
337
1.16
6-1.
533
2.8
x 10
-5
1.08
90.
952-
1.24
5.2
1 1.
201
1.09
2-1.
321
1.6
x 10
-4
FU
SIO
N G
WA
rs
5579
62
7 14
0,23
2,92
4
LOC
6424
21,
MRP
S33
T/C
.0
47
.076
.0
59
.058
.0
53
.067
1.
650
1.28
7-2.
115
5.9
x 10
-5
0.98
20.
770-
1.25
3.8
9 1.
275
1.07
5-1.
514
.005
2
FUSI
ON
GW
A
rs13
2666
34
8 11
8,25
3,96
4
SLC3
0A8
C/T
.6
04
.651
.6
14
.646
.6
09
.649
1.
222
1.08
4-1.
379
.001
1.
143
1.01
6-1.
286
.026
1.
184
1.08
9-1.
287
6.8
x 10
-5
FU
SIO
N G
WA
rs
7839
244
8 14
2,45
7,43
7
GPR
20
A/G
.0
66
.098
.0
82
.080
.0
74
.089
1.
553
1.24
8-1.
932
6.8
x 10
-5
0.96
70.
784-
1.19
2.7
5 1.
212
1.04
4-1.
407
.012
FUSI
ON
GW
A
rs10
6319
2 9
21,9
93,3
67
C
DK
N2A
, CD
KN
2B
A/G
.5
56
.582
.5
87
.584
.5
72
.583
1.
094
0.97
5-1.
228
.13
0.98
90.
879-
1.11
4.8
5 1.
045
0.96
3-1.
134
.29
Fo
llow
-up
rs56
4398
9
22,0
19,5
47
C
DK
N2A
, CD
KN
2B
T/C
.5
66
.596
.5
96
.590
.5
82
.593
1.
118
0.99
4-1.
258
.064
0.
970
0.86
3-1.
091
.61
1.04
50.
962-
1.13
5.3
0
Follo
w-u
p rs
2383
208
9 22
,122
,076
-A
/G
.842
.8
62
.836
.8
64
.839
.8
63
1.18
41.
002-
1.40
0 .0
47
1.24
01.
057-
1.45
6.0
082
1.21
91.
086-
1.36
77.
2 x
10-4
Com
bine
d G
WA
rs
1081
1661
9
22,1
24,0
94
-
T/C
.8
52
.870
.8
48
.873
.8
50
.872
1.
168
0.98
0-1.
392
.082
1.
223
1.03
9-1.
441
.015
1.
204
1.06
9-1.
356
.002
2
Follo
w-u
p rs
1329
7268
9
91,2
67,6
96
N
FIL3
G/A
.9
24
.952
.9
45
.949
.9
35
.950
1.
650
1.28
0-2.
128
9.0
x 10
-5
1.09
40.
848-
1.41
3.4
9 1.
353
1.13
2-1.
618
8.3
x 10
-4
FU
SIO
N Im
pute
d rs
2185
935
9 11
4,58
1,79
6
-C
/T
.667
.6
75
.661
.6
62
.664
.6
69
1.02
40.
904-
1.16
0 .7
1 1.
008
0.89
5-1.
136
.89
1.01
80.
935-
1.11
0.6
8
Com
bine
d G
WA
rs
1416
904
9 13
1,36
3,87
1
KIA
A051
5, P
OM
T1,
UC
K1
T/C
.9
31
.952
.9
25
.935
.9
28
.943
1.
479
1.15
0-1.
902
.002
1 1.
116
0.89
2-1.
397
.34
1.26
91.
074-
1.49
8.0
049
C
ombi
ned
GW
A
rs12
7087
4 10
29
,879
,870
SVIL
C
/A
.753
.7
99
.780
.7
77
.767
.7
88
1.29
71.
123-
1.49
8 3.
9 x
10-4
0.
976
0.84
9-1.
120
.72
1.11
81.
012-
1.23
4.0
28
FU
SIO
N Im
pute
d rs
9422
546
10
43,3
91,5
05
ZN
F239
, ZN
F485
G
/T
.628
.6
31
.640
.6
51
.634
.6
41
1.00
90.
894-
1.13
8 .8
9 1.
066
0.94
5-1.
203
.30
1.03
60.
951-
1.12
7.4
2
Com
bine
d G
WA
rs
1308
8 10
49
,985
,899
C10
orf7
2 G
/A
.369
.3
98
.363
.3
84
.366
.3
91
1.13
21.
003-
1.27
7 .0
44
1.07
30.
953-
1.20
7.2
4 1.
102
1.01
3-1.
198
.024
Com
bine
d G
WA
rs
1359
624
10
91,3
85,4
08
FL
J372
01,
MPH
OSP
H1,
PAN
K1
C/T
.2
47
.290
.2
68
.265
.2
58
.277
1.
222
1.07
2-1.
394
.002
7 0.
973
0.85
3-1.
110
.68
1.10
81.
010-
1.21
5.0
30
FU
SIO
N G
WA
76
Tabl
e S5
. FU
SIO
N st
age
1, st
age2
, and
stag
e 1
+ 2
T2D
ass
ocia
tion
resu
lts fo
r 80
SNPs
(con
tinue
d)
Stag
e 1
Stag
e 2
Stag
e 1
+ 2
Ris
k C
ontro
lC
ase
Con
trol
Cas
e C
ontro
lC
ase
alle
le/
risk
risk
risk
risk
risk
risk
Po
sitio
n
no
n-ris
k al
lele
al
lele
al
lele
al
lele
al
lele
al
lele
St
age
1 St
age
2 St
age
1 +
2
Rea
son
for
SNP
Chr
(b
p)
G
enes
al
lele
fr
eq
freq
fr
eq
freq
fr
eq
freq
O
R
95%
CI
p-va
lue
OR
95
% C
I p-
valu
e O
R
95%
CI
p-va
lue
fo
llow
-up
rs11
1187
5 10
94
,452
,862
HH
EXC
/T
.526
.5
57
.519
.5
36
.522
.5
46
1.12
81.
006-
1.26
6 .0
39
1.05
80.
943-
1.18
7.3
5 1.
097
1.01
2-1.
189
.025
New
Ass
oc
rs79
2383
7 10
94
,471
,897
-G
/A
.603
.6
31
.591
.6
13
.597
.6
22
1.12
20.
997-
1.26
3 .0
57
1.09
00.
970-
1.22
6.1
5 1.
107
1.01
9-1.
203
.016
Com
bine
d G
WA
/ N
ew A
ssoc
rs
4506
565
10
114,
746,
031
TC
F7L2
T/A
.2
14
.250
.2
17
.248
.2
16
.249
1.
257
1.08
9-1.
450
.001
7 1.
187
1.03
7-1.
360
.013
1.
221
1.10
7-1.
346
6.4
x 10
-5
FU
SIO
N Im
pute
d/
Prev
Ass
oc
rs79
0314
6 10
11
4,74
8,33
9
TCF7
L2T/
C
.179
.2
29
.183
.2
26
.181
.2
27
1.38
81.
197-
1.61
0 1.
3 x
10-5
1.
295
1.12
2-1.
495
3.9
x 10
-4
1.34
31.
213-
1.48
81.
4 x
10-8
FUSI
ON
GW
A/
Prev
Ass
oc
rs12
2553
72
10
114,
798,
892
TC
F7L2
T/G
.1
56
.203
.1
65
.199
.1
61
.201
1.
400
1.20
1-1.
632
1.5
x 10
-5
1.24
41.
070-
1.44
7.0
044
1.31
81.
184-
1.46
73.
6 x
10-7
FUSI
ON
GW
A/
Prev
Ass
oc
rs52
19
11
17,3
66,1
48
AB
CC
8, K
CN
J11
T/C
.4
45
.489
.4
82
.490
.4
64
.489
1.
204
1.06
9-1.
357
.002
2 1.
035
0.92
2-1.
162
.56
1.10
91.
021-
1.20
4.0
14
C
ombi
ned
Impu
ted/
Pr
ev A
ssoc
rs
9300
039
11
41,8
71,9
42
-
C/A
.8
90
.925
.8
94
.924
.8
92
.924
1.
520
1.23
6-1.
869
6.0
x 10
-5
1.44
21.
179-
1.76
43.
2 x
10-4
1.
478
1.28
0-1.
705
6.8
x 10
-8
FU
SIO
N G
WA
rs
1103
6627
11
41
,881
,290
-C
/A
.912
.9
46
.924
.9
46
.918
.9
46
1.66
51.
313-
2.11
0 1.
9 x
10-5
1.
466
1.15
9-1.
856
.001
3 1.
563
1.32
4-1.
846
9.2
x 10
-8
FU
SIO
N Im
pute
d rs
1083
7766
11
41
,984
,377
-T/
C
.827
.8
69
.846
.8
70
.836
.8
70
1.39
71.
181-
1.65
2 8.
6 x
10-5
1.
252
1.05
8-1.
482
.008
8 1.
313
1.16
6-1.
477
5.8
x 10
-6
FU
SIO
N Im
pute
d rs
7480
010
11
42,2
03,2
94
LO
C38
7761
G
/A
.174
.1
74
.162
.1
71
.168
.1
72
1.00
40.
863-
1.16
9 .9
6 1.
078
0.92
5-1.
257
.333
1.
034
0.92
9-1.
151
.54
N
ew A
ssoc
rs
4379
834
11
44,1
15,0
14
AL
X4, E
XT2,
PH
ACS
G/A
.3
16
.316
.2
95
.306
.3
05
.311
0.
980
0.86
5-1.
111
.76
1.06
30.
936-
1.20
7.3
5 1.
027
0.94
0-1.
123
.55
N
ew A
ssoc
rs
1161
6188
12
6,
373,
003
LT
BR, S
CN
N1A
A
/G
.426
.4
84
.445
.4
55
.436
.4
70
1.27
01.
131-
1.42
6 4.
8 x
10-5
1.
040
0.92
7-1.
167
.50
1.14
81.
059-
1.24
48.
3 x
10-4
FUSI
ON
Impu
ted
rs37
5126
2 12
12
,509
,957
DU
SP16
, LO
H12
CR1
G
/A
.914
.9
32
.917
.9
04
.916
.9
18
1.29
81.
038-
1.62
3 .0
22
0.85
30.
698-
1.04
3.1
2 1.
039
0.89
6-1.
205
.61
C
ombi
ned
GW
A
rs11
5318
8 12
53
,385
,263
-A
/T
.699
.7
21
.682
.7
02
.690
.7
11
1.10
00.
966-
1.25
1 .1
5 1.
118
0.98
9-1.
266
.075
1.
109
1.01
5-1.
212
.022
Com
bine
d Im
pute
d rs
7132
840
12
69,6
97,8
28
-
T/G
.4
25
.442
.4
26
.438
.4
25
.440
1.
070
0.94
9-1.
205
.27
1.06
50.
951-
1.19
3.2
7 1.
063
0.97
9-1.
153
.14
C
ombi
ned
Impu
ted
rs38
2525
3 12
10
7,61
1,74
7
CO
RO1C
, DAO
, SS
H1
A/G
.9
73
.989
.9
87
.986
.9
08
.988
2.
575
1.60
4-4.
134
3.6
x 10
-5
0.99
10.
602-
1.63
1.9
7 1.
678
1.20
4-2.
337
.001
9
FUSI
ON
GW
A
rs23
0045
5 12
10
8,08
6,23
6
ACAC
BG
/A
.815
.8
39
.821
.8
20
.818
.8
29
1.16
60.
999-
1.36
1 .0
51
0.99
70.
857-
1.16
1.9
7 1.
075
0.96
5-1.
197
.19
C
ombi
ned
GW
A
rs47
6765
8 12
11
6,98
2,16
1
FLJ2
0674
, WSB
2 T/
C
.577
.6
33
.609
.6
13
.593
.6
23
1.27
41.
134-
1.43
0 4.
1 x
10-5
1.
025
0.91
2-1.
151
.68
1.13
41.
045-
1.23
0.0
025
FU
SIO
N G
WA
rs
1033
594
14
36,2
81,3
17
SL
C25A
21
C/T
.4
79
.502
.4
96
.507
.4
87
.505
1.
069
0.95
1-1.
202
.26
1.04
90.
933-
1.17
8.4
2 1.
067
0.98
2-1.
158
.13
C
ombi
ned
GW
A
rs14
4972
5 14
38
,246
,572
-C
/T
.540
.6
07
.584
.5
95
.562
.6
00
1.31
51.
163-
1.48
6 1.
1 x
10-5
1.
063
0.94
3-1.
197
.32
1.18
01.
084-
1.28
41.
3 x
10-4
FUSI
ON
Impu
ted
rs22
6897
4 14
68
,492
,917
ACTN
1 G
/A
.231
.2
42
.221
.2
21
.226
.2
31
1.05
80.
920-
1.21
6 .4
3 0.
990
0.86
3-1.
136
.89
1.02
00.
926-
1.12
4.6
9
Com
bine
d Im
pute
d rs
1291
0827
15
56
,417
,311
-T/
G
.021
.0
45
.029
.0
32
.025
.0
39
2.19
51.
541-
3.12
7 6.
3 x
10-6
1.
109
0.80
0-1.
539
.53
1.55
91.
232-
1.97
21.
8 x
10-4
FUSI
ON
Impu
ted
rs10
5210
95
16
13,5
28,9
36
-
A/G
.2
06
.256
.2
28
.229
.2
17
.243
1.
351
1.17
4-1.
554
2.3
x 10
-5
1.00
80.
882-
1.15
3.9
0 1.
157
1.05
1-1.
274
.002
8
FUSI
ON
GW
A
rs80
5013
6 16
52
,373
,776
FTO
A/C
.4
03
.415
.3
61
.397
.3
81
.406
1.
034
0.92
0-1.
162
.58
1.17
91.
046-
1.32
9.0
070
1.10
71.
019-
1.20
3.0
17
C
ombi
ned
GW
A
rs18
0077
4 16
55
,573
,046
CET
PC
/T
.667
.7
26
.705
.6
99
.687
.7
12
1.34
81.
182-
1.53
7 7.
3 x
10-6
0.
967
0.85
1-1.
098
.60
1.13
81.
040-
1.24
6.0
05
FU
SIO
N Im
pute
d rs
1164
6114
16
85
,141
,275
FLJ1
2998
, FO
XC2,
M
THFS
D
T/A
.8
95
.921
.9
15
.905
.9
05
.913
1.
382
1.12
4-1.
698
.002
0.
892
0.72
8-1.
092
.27
1.11
00.
962-
1.28
1.1
5
FUSI
ON
Impu
ted
rs72
2230
8 17
25
,301
,167
CC
DC
55, E
FCAB
5,
FLJ4
6247
, SLC
6A4,
SS
H2
T/C
.5
32
.553
.5
35
.552
.5
33
.553
1.
094
0.97
3-1.
229
.13
1.07
50.
958-
1.20
6.2
2 1.
086
1.00
1-1.
179
.047
Com
bine
d G
WA
rs17
3840
05
18
1,56
5,02
0
-A
/G
.842
.8
59
.858
.8
59
.851
.8
59
1.14
70.
974-
1.35
1 .1
0 1.
004
0.85
0-1.
186
.96
1.07
40.
956-
1.20
6.2
3
FUSI
ON
Impu
ted
rs17
5200
22
18
,543
,063
-A
/G
.490
.5
52
.538
.5
53
.515
.5
53
1.28
51.
137-
1.45
2 5.
5 x
10-5
1.
069
0.95
4-1.
198
.25
1.16
51.
072-
1.26
52.
9 x
10-4
FUSI
ON
Impu
ted
rs56
5979
22
19
,353
,500
DK
FZp4
34N
035,
LO
C15
0207
, LO
C64
5289
, PI
K4C
A, S
ERPI
ND
1
C/T
.6
79
.730
.7
27
.709
.7
03
.720
1.
295
1.13
9-1.
472
7.0
x 10
-5
0.92
90.
816-
1.05
6.2
6 1.
090
0.99
6-1.
193
.060
FUSI
ON
GW
A
rs22
6733
9 22
35
,290
,742
CAC
NG
2G
/T
.611
.6
74
.630
.6
18
.621
.6
46
1.34
11.
182-
1.52
1 4.
5 x
10-6
0.
939
0.83
2-1.
060
.31
1.11
21.
020-
1.21
3.0
16
FU
SIO
N Im
pute
d
77
31
Tabl
e S6
: C
ompa
rison
of T
2D a
ssoc
iatio
n re
sults
for S
NPs
that
wer
e im
pute
d w
ith a
p-v
alue
< .0
01 a
nd th
en g
enot
yped
in th
e FU
SIO
N st
age
1 sa
mpl
e
R
isk
alle
le fr
eque
ncy
in c
ontro
ls
F
USI
ON
Sta
ge 1
Im
pute
da
FU
SIO
N S
tage
1
Gen
otyp
ed
Im
puta
tion
qual
ity
mea
sure
s
SNP
Gen
es
Impu
ted
Gen
otyp
ed
p-
valu
ea O
Ra
p-
valu
e O
R
Im
puta
tion
cons
iste
ncyc
Estim
ated
r2
d
O
bser
ved
alle
lic
conc
orda
nce
Max
imum
r2 w
ith S
NPs
us
ed fo
r im
puta
tion
rs12
9108
27
.024
.0
21
2.
5 x
10-6
2.
57
6.
3 x
10-6
2.
20
.9
77
.720
.994
.3
9 rs
1449
725
.544
.5
40
5.
3 x
10-6
1.
33
1.
1 x
10-5
1.
31
.9
89
.977
.990
.9
0 rs
1708
1352
.9
09
.905
7.3
x 10
-6
1.70
5.5
x 10
-6
1.68
.994
.9
54
1.
000
.87
rs11
6161
88
SCN
N1A
/LTB
R .4
74
.426
1.5
x 10
-5
1.40
4.8
x 10
-5
1.27
.760
.5
85
.9
19
.27
rs10
8377
66
.840
.8
27
1.
5 x
10-5
1.
49
8.
6 x
10-5
1.
40
.9
75
.930
.975
.4
6 rs
1103
6627
.9
03
.912
1.7
x 10
-5
1.67
1.9
x 10
-5
1.66
.976
.9
01
.9
87
.75
rs17
3840
05
.811
.8
42
1.
9 x
10-5
1.
84
.1
0 1.
15
.7
43
.309
.874
.1
1 rs
7750
445
.116
.1
36
2.
0 x
10-5
1.
47
4.
1 x
10-5
1.
41
.9
86
.965
.977
.5
0 rs
2267
339
CAC
NG
2 .6
13
.611
2.8
x 10
-5
1.33
4.5
x 10
-6
1.34
.939
.8
73
.9
90
.72
rs17
3564
14
.551
.6
94
3.
0 x
10-5
1.
30
8.
0 x
10-4
1.
25
.9
44
.920
.878
.3
4 rs
1800
774
CET
P.6
42
.667
3.9
x 10
-5
1.39
7.3
x 10
-6
1.35
.810
.6
17
.9
72
.29
rs17
5200
.4
93
.490
6.6
x 10
-5
1.28
5.5
x 10
-5
1.28
.993
.9
76
.9
97
.85
rs61
0371
6 .3
42
.342
7.3
x 10
-5
1.28
4.8
x 10
-5
1.29
.993
.9
78
.9
99
.33
rs13
2972
68
NFI
L3
.928
.9
24
7.
5 x
10-5
1.
72
9.
0 x
10-5
1.
65
.9
88
.916
.998
.2
8 rs
1164
6114
FO
XC2/
FLJ1
2998
.8
68
.895
9.1
x 10
-5
1.66
.002
0 1.
38
.8
60
.512
.956
.1
3 rs
2021
966
ENPP
1.5
84
.576
9.1
x 10
-5
1.32
2.6
x 10
-4
1.25
.846
.7
69
.9
37
.46
rs12
7087
4 SV
IL
.745
.7
53
1.
4 x
10-4
1.
33
3.
9 x
10-4
1.
30
.9
83
.954
.988
.2
4 rs
4812
831
.150
.1
16
1.
6 x
10-4
1.
53
.0
055
1.28
.831
.5
16
.9
44
.45
rs44
0296
0 IG
F2BP
2 .2
90
.291
1.7
x 10
-4
1.27
1.2
x 10
-4
1.28
.997
1.
026
.9
98
1.00
rs
2466
291
SLC
30A8
.3
99
.361
6.3
x 10
-4
1.26
.001
6 1.
22
.8
74
.830
.935
.4
7 rs
1801
282
PPAR
G.8
16
.816
9.5
x 10
-4
1.31
.001
1 1.
30
.9
99
1.00
2
1.00
0 1.
00
rs38
0217
7 SL
C30
A8
.604
.6
05
9.
9 x
10-4
1.
23
.0
012
1.22
.999
1.
015
.9
99
1.00
rs
4506
565
TCF7
L2
.213
.2
14
.0
015b
1.26
.001
7 1.
26
.9
99
.965
1.00
0 .9
2 a Im
puta
tion-
base
d an
alys
is re
stric
ted
to in
divi
dual
s with
succ
essf
ul g
enot
ypes
for t
he sa
me
SNP;
thes
e re
sults
may
diff
er fr
om th
e im
pute
d re
sults
in
Tabl
e S2
whi
ch a
re b
ased
on
all s
tage
1 in
divi
dual
s b Im
pute
d p-
valu
e =
7.0
x 10
-4 in
stag
e 1
sam
ple
c Impu
tatio
n co
nsis
tenc
y is
the
prop
ortio
n of
impu
tatio
n ite
ratio
ns th
at a
gree
d w
ith th
e m
ost l
ikel
y ge
noty
pe
d The
estim
ated
r2 is th
e ra
tio o
f obs
erve
d va
rianc
e of
dos
age
scor
es a
cros
s sam
ples
to th
e ex
pect
ed v
aria
nce
give
n th
e im
pute
d SN
P al
lele
freq
uenc
y
78
Table S7. SNP annotation weights used in SNP picking for stage 2 genotyping Annotation Weight Maximum of: Frameshift 50 Stop codon 50 Critical splice site 50 Poly A signal 30 Any change to initial ATG signal 30 Non-synonymous coding: Identical amino acid seen in more than 75% of mammals 20 Similar amino acid seen in more than 75% of mammals 20 Non-conservative amino acid change 6 to 9a Other non-synonymous 5 SNP in exon, includes 5 and 3 UTRs 2 Bonus: FUSION linkage LOD>1 1 to 3b SNP near candidate gene 1.5 SNP near gene over-expressed in tissue of interest 1.5 Conserved 1.2 Near any gene 1.2 a For non-conservative amino acid changes, the weight is 5 - x, where -4 < x < -1 is the
BLOSUM62 score for the amino acid substitution (23) b For linkage, the weight is the T2D LOD score in the FUSION 1+2 families (2) if that LOD
score is >1
79
Supplemental Online Material References
1. T. Valle et al., Diabetes Care 21, 949 (1998). 2. K. Silander et al., Diabetes 53, 821 (2004). 3. T. Saaristo et al., Diab Vasc Dis Res 2, 67 (2005). 4. Geneva, World Health Organization (1999). 5. M. Peltonen et al., Suomen Lääkäril (Finnish Med J) 61, 163 (2005). 6. A. Aromaa, S. Koskinen, Publications of the National Public Health Institute, Helsinki,
Finland (2004). 7. J. Tuomilehto et al., Int J Epidemiol 20, 1010 (1991). 8. J. Saramies, Acta Univ. Oul., D 812 (2004). 9. M. I. Hawa, T. O. Ola, A. Gigante, J. Teng, R. D. G. Leslie, Diabetologia 49, 182 (2006). 10. K. L. Gunderson et al., Methods Enzymol 410, 359 (2006). 11. M. Barnhart et al., American Society of Human Genetics A242 (2006). 12. M. P. Epstein, W. L. Duren, M. Boehnke, Am J Hum Genet 67, 1219 (2000). 13. J. E. Wigginton, G. R. Abecasis, Bioinformatics 21, 3445 (2005). 14. A. Agresti, Categorical Data Analysis (John Wiley & Sons, ed. 2nd, 2002), pp. 710. 15. International HapMap Consortium, Nature 437, 1299 (2005). 16. S. Purcell, S. S. Cherny, P. C. Sham, Bioinformatics 19, 149 (2003). 17. N. Li, M. Stephens, Genetics 165, 2213 (2003). 18. Y. Li, P. Scheet, J. Ding, G. R. Abecasis, (Submitted for publication; manuscript
available from GRA). 19. C. J. Willer et al., Genet Epidemiol 30, 180 (2006). 20. L. V. Hedges, Psychol Bull 92, 490 (1982). 21. K. Roeder, S. A. Bacanu, L. Wasserman, B. Devlin, Am J Hum Genet 78, 243 (2006). 22. Arch Intern Med 161, 397 (2001). 23. S. Henikoff, J. G. Henikoff, Proc Natl Acad Sci U S A 89, 10915 (1992).
80
Chapter 4
Variations in the G6PC2/ABCB11 genomic region are associated with fasting glucose levels
Journal of Clinical Investigation2008;118(7):2620-2628
81
82
Wei-Min Chen,1,2 Michael R. Erdos,3 Anne U. Jackson,4 Richa Saxena,5 Serena Sanna,4,6
Kristi D. Silver,7 Nicholas J. Timpson,8 Torben Hansen,9 Marco Orrù,6 Maria Grazia Piras,6
Lori L. Bonnycastle,3 Cristen J. Willer,4 Valeriya Lyssenko,10 Haiqing Shen,7 Johanna Kuusisto,11
Shah Ebrahim,12 Natascia Sestu,13 William L. Duren,4 Maria Cristina Spada,6
Heather M. Stringham,4 Laura J. Scott,4 Nazario Olla,6 Amy J. Swift,3 Samer Najjar,13
Braxton D. Mitchell,7 Debbie A. Lawlor,8 George Davey Smith,8 Yoav Ben-Shlomo,14
Gitte Andersen,9 Knut Borch-Johnsen,9,15,16 Torben Jørgensen,15 Jouko Saramies,17 Timo T. Valle,18
Thomas A. Buchanan,19,20 Alan R. Shuldiner,7 Edward Lakatta,13 Richard N. Bergman,20
Manuela Uda,6 Jaakko Tuomilehto,18,21 Oluf Pedersen,9,16 Antonio Cao,6 Leif Groop,10
Karen L. Mohlke,22 Markku Laakso,11 David Schlessinger,13 Francis S. Collins,3 David Altshuler,5
Gonçalo R. Abecasis,4 Michael Boehnke,4 Angelo Scuteri,23,24 and Richard M. Watanabe20,25
1Department of Public Health Sciences and 2Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA. 3Genome Technology Branch, National Human Genome Research Institute, Bethesda, Maryland, USA.
4Center for Statistical Genetics and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA. 5Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
6Istituto di Neurogenetica e Neurofarmacologia, Consiglio Nazionale delle Ricerche, Cagliari, Italy. 7Division of Endocrinology, Diabetes and Nutrition, University of Maryland School of Medicine, Baltimore, Maryland, USA. 8MRC Centre for Causal Analyses in Translational Epidemiology,
Department of Social Medicine, University of Bristol, Bristol, United Kingdom. 9Steno Diabetes Center, Gentofte, Denmark. 10Department of Clinical Sciences, Diabetes and Endocrinology, Lund University, University Hospital Malmö, Malmö, Sweden.
11Department of Medicine, University of Kuopio and Kuopio University Hospital, Kuopio, Finland. 12Department of Epidemiology and Population Health, Non-communicable Disease Epidemiology Unit, London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom. 13Gerontology Research Center, National Institute on Aging, Baltimore, Maryland, USA. 14Social Medicine Department, University of Bristol, Bristol,
United Kingdom. 15Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark. 16Faculty of Health Sciences, University of Aarhus, Aarhus, Denmark. 17Savitaipale Health Center, Savitaipale, Finland. 18Diabetes Unit,
Department of Health Promotion and Chronic Disease Prevention, National Public Health Institute, and Department of Public Health, University of Helsinki, Helsinki, Finland. 19Department of Medicine, Division of Endocrinology, and 20Department of Physiology and Biophysics, Keck School of Medicine, University of Southern California, Los Angeles, California, USA. 21South Ostrobothnia Central Hospital, Senäjoki, Finland. 22Department of Genetics,
University of North Carolina, Chapel Hill, North Carolina, USA. 23Laboratory of Cardiovascular Science, National Institute on Aging, NIH, Baltimore, Maryland, USA. 24Unità Operativa Geriatria, Istituto Nazionale Ricovero E Cura Anziari, Rome, Italy.
25Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, USA.
Glucose is the major source of energy in humans, with levels in vivo determined by a balance of glucose absorption via the gut, production primarily by the liver, and utilization by both insulin-sensitive and insulin-insensitive tissues (1, 2). Homeostatic control of glucose levels involves complex interactions between humoral and neural mechanisms that work in concert to regulate tightly the balance between production and utilization to maintain a nor-
Nonstandard abbreviations used: ABCB11, ATP-binding cassette, subfamily B (MDR/TAP), member 11; BWHHS, British Women’s Heart and Health Study; DGI, Diabetes Genetics Initiative; FUSION, Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics; G6PC2, glucose-6-phosphatase catalytic subunit 2; GWA, genome-wide association; LD, linkage disequilibrium; METSIM, METabolic Syndrome in Men; T2DM, type 2 diabetes mellitus.
Conflict of interest: The authors have declared that no conflict of interest exists.
Citation for this article: J. Clin. Invest. 118:2620–2628 (2008). doi:10.1172/JCI34566.
Identifying the genetic variants that regulate fasting glucose concentrations may further our understanding of the pathogenesis of diabetes. We therefore investigated the association of fasting glucose levels with SNPs in 2 genome-wide scans including a total of 5,088 nondiabetic individuals from Finland and Sardinia. We found a significant association between the SNP rs563694 and fasting glucose concentrations (P = 3.5 × 10–7). This association was fur-ther investigated in an additional 18,436 nondiabetic individuals of mixed European descent from 7 different stud-ies. The combined P value for association in these follow-up samples was 6.9 × 10–26, and combining results from all studies resulted in an overall P value for association of 6.4 × 10–33. Across these studies, fasting glucose concentra-tions increased 0.01–0.16 mM with each copy of the major allele, accounting for approximately 1% of the total varia-tion in fasting glucose. The rs563694 SNP is located between the genes glucose-6-phosphatase catalytic subunit 2 (G6PC2) and ATP-binding cassette, subfamily B (MDR/TAP), member 11 (ABCB11). Our results in combination with data reported in the literature suggest that G6PC2, a glucose-6-phosphatase almost exclusively expressed in pancreatic islet cells, may underlie variation in fasting glucose, though it is possible that ABCB11, which is expressed primarily in liver, may also contribute to such variation.
83
mal fasting glucose. Elevations in blood glucose are diagnostic of diabetes. Type 2 diabetes mellitus (T2DM) afflicts more than 171 million worldwide and is a leading cause of kidney failure, blind-ness, and lower limb amputations (3–5). Even more modest eleva-tions in glucose concentration (so-called prediabetes) are associated with cardiovascular disease and accelerated atherosclerosis (6). In individuals progressing toward future T2DM, the fasting glucose concentration appears to change only modestly over time until the advent of β cell dysfunction, at which point the glucose concen-tration increases rapidly (7, 8). Many studies have shown that the lowering of glucose levels in individuals with diabetes can prevent or delay diabetes-related complications, providing further evidence for the damaging effects of chronic glucose elevations.
Both genetic and environmental factors contribute to the patho-physiology of T2DM (9–11). The contributions of environmental exposures to T2DM risk are best illustrated by results from the Dia-betes Prevention Program (11) and the Finnish Diabetes Prevention Study (12), in which T2DM incidence was significantly reduced by intensive lifestyle modification. However, the contribution of genetic factors to T2DM risk is not as well understood. Recent genome-wide association (GWA) studies have identified 16 novel T2DM susceptibility loci (13–18), generating new insights into the genetic architecture underlying T2DM. In contrast to disease sta-tus, even less is known about genetic variation that alters specific T2DM-related quantitative traits such as glucose and insulin con-centrations. As seen for T2DM, identification of genetic variants associated with T2DM-related quantitative traits is likely to require large sample sizes due to relatively small gene effect sizes. Fasting glucose concentrations have been shown to be heritable, with nar-row-sense heritability estimates ranging from 25% to 40% (19–24). Given the central role of glucose concentration in the pathogenesis and diagnosis of T2DM and its complications, GWA for glucose concentrations provides an excellent opportunity to identify genes underlying variation in glucose concentrations that may also repre-sent additional T2DM susceptibility loci. An example of this comes from the studies by Weedon et al., who showed by metaanalysis and large cohorts that variation in the glucokinase gene was associated with both fasting glucose and birth weight (25).
GWA studies for T2DM and adiposity were completed by the groups undertaking the Finland–United States Investigation of Non–Insulin-Dependent Diabetes Mellitus Genetics (FUSION) (14, 26, 27) and the SardiNIA Study of Aging (24, 28), respec-tively. Both studies assessed fasting glucose in their respective
cohorts, allowing GWAs for fasting glucose in each study and combination of these results in a metaanalysis. The strongest signals from the fasting glucose GWA metaanalysis were from variants near genes for ATP-binding cassette, subfamily B (MDR/TAP), member 11 (ABCB11) and glucose-6-phosphatase catalytic subunit 2 (G6PC2). This association was replicated in a series of 7 studies involving a total of 18,436 individuals (13, 29–35), sug-gesting for what we believe is the first time that variation in one of these genes may play a role in the regulation of fasting glucose concentrations in humans.
Subject demographics and clinical characteristics for the FUSION and SardiNIA samples are summarized in Table 1. Because treat-ment for T2DM affects fasting glucose concentrations, all analyses in this report were restricted to nondiabetic subjects. Initial review of association results from both the FUSION stage 1 and SardiNIA GWA scans of a combined total of 5,088 nondiabetic individuals focused on SNPs that were genotyped in the SardiNIA study and imputed in the FUSION study. Among these, rs563694 exhibited the strongest evidence for association in both samples (SardiNIA, P = 7.6 × 10–5; FUSION stage 1, P = 8.0 × 10–4; Table 2), with a metaanalysis P value of 3.5 × 10–7. Given the strength of this initial association, our follow-up efforts focused on rs563694. Additional independent associations from our fasting glucose GWA study are presented in Supplemental Table 1 (supplemental material avail-able online with this article; doi:10.1172/JCI34566DS1).
Analyses were repeated once imputation was completed in both the FUSION stage 1 and SardiNIA samples. SNP rs563694 and other SNPs in strong linkage disequilibrium (LD; defined as r2 > 0.8 in the FUSION samples) constituted the 17 strongest association results in the combined FUSION/SardiNIA GWA for fasting glu-cose metaanalysis (Figure 1). In fact, 22 SNPs associated with fast-ing plasma glucose with P ≤ 1 × 10–4 were located within a 63.9-kb region on chromosome 2 (Supplemental Table 2). These SNPs were located in an extended region of LD that spans 2 biologically plausible candidate genes for glucoregulation (Figure 1). The first is G6PC2, also known as islet-specific glucose-6-phosphatase–related protein (IGRP). G6PC2 is part of a larger family of enzymes involved in hydrolysis of glucose-6-phosphate in the gluconeogen-ic and glycogenolytic pathways (36, 37). The second is ABCB11, a member of the MDR/TAP subfamily of ATP-binding cassette transporters involved in multidrug resistance (38, 39).
Subject demographics and clinical characteristics for individuals with rs563694 genotype data
Study Phenotyped Geographic Study age BMI Fasting subjects origin (years) (kg/m2) glucose (mM)FUSION stage 1 1,233 Finland 63.0 (13.7) 26.6 (5.0) 5.36 (0.72)FUSION stage 2 655 Finland 61.0 (12.3) 26.3 (4.9) 5.48 (0.50)FUSION additional spouses/offspring 522 Finland 39.1 (12.2) 26.0 (6.4) 5.11 (0.78)SardiNIA 3,855 Sardinia, Italy 41.3 (27.1) 24.7 (6.3) 4.72 (0.77)DGI 1,411 Finland and Sweden 58.7 (15.4) 26.7 (4.78) 5.28 (0.70)Amish 1,655 USA 49.0 (23.7) 26.7 (6.6) 4.90 (0.58)METSIM 4,386 Finland 59.0 (10.0) 26.4 (4.5) 5.60 (0.70)Caerphilly 1,063 United Kingdom 56.7 (4.4) 26.2 (3.5) 4.80 (0.86)BWHHS 3,532 United Kingdom 68.5 (5.9) 27.6 (5.0) 5.80 (0.87)Inter99 5,734 Denmark 46.1 (7.9) 26.3 (4.5) 5.54 (0.80)
84
Among all genotyped or imputed SNPs in this region, rs560887, which was genotyped in FUSION stage 1, and imputed and fol-lowed up by genotyping in SardiNIA, showed the strongest overall evidence for association (SardiNIA, P = 4.4 × 10–8; FUSION stage 1, P = 1.7 × 10–3; Supplemental Table 2), with a metaanalysis P value of 2.8 × 10–10. In addition, rs853789 and rs853787, both located in intron 19 of ABCB11 and in perfect LD with each other (Dʹ = 1.0, r2 = 1.0), showed strong evidence for association with fasting glucose concentrations with metaanalysis P values of 1.4 × 10–9 and 1.0 × 10–9, respectively (Supplemental Table 2). rs853789 is located 38.3 kb from rs560887 and 27.4 kb from rs563694 and is in strong LD with both SNPs (Dʹ = 0.98, r2 = 0.81 with rs560887; and Dʹ = 0.98, r2 = 0.95 with rs563694). rs560887 is 10.9 kb from rs563694, is in high LD with rs563694 (Dʹ = 0.99, r2 = 0.84), and is located in intron 3 of G6PC2. In contrast, rs563694 lies between G6PC2 and ABCB11 and is in extended LD with ABCB11. In both the SardiNIA and FUSION stage 1 samples, each copy of the A allele for rs563694 was associated with small increases in fast-ing glucose (0.064 mM for SardiNIA and 0.051 mM for FUSION stage 1; Table 2) that are clinically insignificant and accounted for approximately 1% of the variance in fasting glucose. Similar effect sizes were observed for rs560887 (0.089 mM for SardiNIA and 0.052 mM for FUSION stage 1).
We assessed the potential contribution of population stratifica-tion by computing the genomic control parameter (40) indepen-dently for both studies. The genomic control values were 1.01 for both FUSION and SardiNIA, suggesting that population stratifi-cation and/or unmodeled relatedness did not contribute signifi-cantly to our observed association. Analyses that included BMI as a covariate did not significantly alter the association between rs563694 and fasting glucose in the FUSION stage 1 (P = 8.1 × 10–4 without BMI versus 9.1 × 10–4 with BMI) and SardiNIA samples (7.6 × 10–5 without BMI versus 3.8 × 10–5 with BMI) independently or jointly (P = 3.5 × 10–7 without BMI versus 1.8 × 10–7 with BMI), suggesting the association was not a consequence of adiposity, which is known to induce insulin resistance and increase glucose concentrations (2). The association between rs563694 and fasting
glucose also remained significant after individual adjustment for each of the 10 SNPs shown to be associated with T2DM in our recent GWA studies (Supplemental Table 3) (13–15) or when all 10 SNPs were included jointly in the model (P = 6.8 × 10–4 versus P = 8.0 × 10–4 for FUSION stage 1 samples and 5.1 × 10–5 versus 7.6 × 10–5 for SardiNIA samples). The 10 SNPs shown to be associ-ated with T2DM were themselves not significantly associated with fasting glucose concentrations in the FUSION stage 1 or SardiNIA samples (Supplemental Table 4).
FUSION investigators genotyped rs563694 in 655 stage 2 sam-ples and 522 additional spouses and offspring of T2DM patients included in stage 1; this SNP continued to show evidence for association with fasting glucose (FUSION stage 2, P = 2.0 × 10–3; FUSION stage 1 families, P = 1.9 × 10–5; see Table 2). The meta-analysis that combined results from the FUSION stage 1 and 2 and SardiNIA studies resulted in a P value of 5.3 × 10–9, surpassing standard thresholds for genome-wide significance.
We also examined the association between rs563694 and fast-ing glucose in 6 follow-up samples (Table 2). The characteristics of these samples are summarized in Table 1. Association between rs563694 and fasting glucose was confirmed in the Amish study (P = 4.1 × 10–5); the METabolic Syndrome In Men study (METSIM; P = 1.3 × 10–10), the Caerphilly study (2.6 × 10–7), the British Wom-en’s Heart and Health Study (BWHHS; P = 1.2 × 10–3), and Inter99 (P = 8.2 × 10–8; Table 2), with fasting glucose concentrations increas-ing with each copy of the A allele in all studies. While evidence for association in the Diabetes Genetics Initiative (DGI) study was not statistically significant (P = 0.19; Table 2), the results show a trend in the same direction as observed in the other samples. When the results from all follow-up studies were combined in a meta-analysis of 24,046 samples, there was strong evidence for associa-tion between rs563694 and fasting glucose in both the follow-up samples (n = 18,435, P = 6.9 × 10–26; Table 2) and in all GWA and follow-up samples combined (n = 24,046, P = 6.4 × 10–33; Table 2). In contrast, rs563694 did not show evidence for association with T2DM in the FUSION stage 1 study (P = 0.22), the DGI GWA sam-ples (P = 0.78), or the METSIM study (P = 0.09).
Association between rs563694 and fasting glucose in nondiabetic individuals
Frequency Mean fasting glucose (mM) (SD) Effect (SE) Effect (SE)Study n C allele CC AC AA mM Standardized P valueGWA Samples FUSION stage 1 1,233 0.34 5.26 (0.48) 5.31 (0.48) 5.33 (0.47) 0.051 (0.019) 0.143 (0.043) 8.0 × 10–4
SardiNIA 3,855 0.46 4.88 (0.67) 4.95 (0.62) 5.00 (0.59) 0.064 (0.018) 0.118 (0.030) 7.6 × 10–5
3.5 × 10–7A
FUSION 1 familiesB 1,755 0.34 5.20 (0.49) 5.28 (0.50) 5.31 (0.54) 0.065 (0.018) 0.155 (0.036) 1.9 × 10–5
Follow-up samples FUSION stage 2 655 0.36 5.28 (0.43) 5.44 (0.35) 5.46 (0.36) 0.068 (0.021) 0.180 (0.058) 2.0 × 10–3
DGI 1,411 0.34 5.24 (0.50) 5.28 (0.51) 5.29 (0.49) 0.022 (0.021) 0.053 (0.039) 0.19 Amish 1,655 0.24 4.90 (0.47) 4.89 (0.51) 5.03 (0.53) 0.090 (0.022) 0.175 (0.042) 4.1 × 10–5
METSIM 4,386 0.32 5.55 (0.49) 5.64 (0.50) 5.71 (0.49) 0.074 (0.011) 0.145 (0.023) 1.3 × 10–10
Caerphilly 1,063 0.36 4.69 (0.91) 4.87 (0.99) 5.00 (1.19) 0.155 (0.047) 0.214 (0.041) 2.6 × 10–7
BWHHS 3,532 0.34 6.01 (1.69) 6.09 (1.81) 6.06 (1.49) 0.006 (0.042) 0.079 (0.025) 1.2 × 10–3
Inter99 5,734 0.36 5.46 (0.85) 5.52 (0.87) 5.58 (0.70) 0.057 (0.015) 0.135 (0.019) 8.2 × 10–8
6.3 × 10–28C
6.1 × 10–35D
85
Figure 2 shows the results of a metaanalysis based upon the effect size observed in each of the 8 studies. Overall, fasting glucose concentrations increased 0.065 mM (95% CI: 0.053–0.077 mM) with each copy of the major allele.
We took advantage of GWA studies originally performed to identify susceptibility genes for T2DM (FUSION) and aging-related traits (SardiNIA) to also identify genes underlying variation in fasting glucose concentration. Both FUSION and SardiNIA initially identi-fied rs563694 as being associated with fasting glucose levels. Given that both studies were performed in relatively homogeneous popu-lations of mixed European descent, it is unlikely that population stratification accounted for the initial association. The estimated genomic control (40) values for FUSION stage 1 and SardiNIA were both 1.01, providing further evidence against the contribution of population stratification to the observed association.
In the SardiNIA sample, we genotyped SNPs rs560887 and rs853789 to validate the results based on imputation. The dis-crepancy rate per allele between the imputed and typed genotypes at these 2 SNPs was 1.4% and 2.4%, respectively, and the associa-tion result with the actual genotypes was stronger than with the imputed genotypes: P = 9.0 × 10–10 and 2.6 × 10–8, respectively
Adiposity may induce insulin resistance and thus alter glucose concentrations (2) independent of the effects of the SNP on glu-cose concentrations per se. However, the association remained sig-nificant even when we included BMI as a covariate in the analysis, suggesting adiposity is not a major contributor to the observed association. Similarly, in the follow-up studies, the results did not change whether BMI was included or excluded as a covariate. Some known sex-specific effects, such as differences in fat distri-bution, could also confound our results. We found no sex-specific effect modification in the FUSION and SardiNIA samples. Also, it should be noted that we observed evidence for association between rs563694 in the METSIM and Caerphilly samples that only includ-ed men and in the BWHHS that comprised women only. Thus, the lack of a sex-specific effect in FUSION and SardiNIA is sup-
ported by the independent associations observed in these samples. Subsequent analyses of the GWA data revealed rs560887 as having the strongest evidence for association with fasting glucose in this region and suggested, based on the SNP location, that G6PC2 plays a role in glucoregulation. However, 2 additional SNPs in strong LD with rs560887 located in the adjacent ABCB11 also showed similar evidence for association with fasting glucose.
In the 7 follow-up studies, rs563694 continued to show associa-tion with fasting glucose, although marginal evidence for hetero-geneity among studies was noted (Q = 14.6; P = 0.02; I2 = 59.0%; 95% CI: 5.6–82.2%) (37). For example, the DGI samples did not exhibit a significant association and the BWHHS samples, despite being among the largest follow-up samples, showed only modest evidence for association (Table 2). These 2 studies yielded similar effect size estimates (0.053 for DGI and 0.079 for BWHHS) that were smaller than in the other studies (Table 2). Differences in both populations and sample ascertainment could be contribut-ing to the observed heterogeneity. When these 2 studies are not considered, the heterogeneity estimate is reduced (Q = 3.7; P = 0.45; I2 = 0%; 95% CI = 0–77.6%) However, despite the variability in effect size, the direction of the effect was the same in all studies.
There are 2 biologically plausible candidate genes in the region identified by our association analyses that may affect glucose levels. Although rs560887, which is located in intron 3 of G6PC2 just 26 bp proximal to exon 4, showed the strongest evidence for association in the GWA studies, SNPs in LD with rs560887 and rs563694 that show similar levels of association with fasting glu-cose concentrations were located in intron 19 of ABCB11. ABCB11 is involved in ATP-dependent secretion of bile salts and is almost exclusively expressed in the liver. Mutations in ABCB11 have been shown to be associated with intrahepatic cholestasis (OMIM 603201) (38) and drug-induced hepatotoxicity (39, 41). In anti-lipid drug trials, bile acid sequestrants have been shown to lower glucose concentrations and improve insulin sensitivity, presum-ably through reduction of triglyceride levels (42). Based upon these observations, if ABCB11 were contributing significantly to varia-tion in fasting glucose, one might expect to also see associations
86
with lipids or insulin sensitivity. However, rs560887, rs563694, rs853789, and rs853787 were not associated with lipid measure-ments in a metaanalysis of FUSION stage 1 and SardiNIA samples (P > 0.16). Also, none of these SNPs were associated with minimal model-derived insulin sensitivity in FUSION samples (P > 0.30). Thus, our data do not support a role for ABCB11 in glucoregula-tion, and other evidence directly linking ABCB11 to regulation of glucose concentrations is scarce.
In contrast, G6PC2, the β cell–specific isoform of glucose-6-phos-phatase is a highly relevant candidate gene for glucoregulation. The mouse homolog G6pc2 has been previously implicated as an auto-antigen in the NOD mouse model of type 1 diabetes (43). Wang et al. recently generated G6pc2-null mice and noted that at 16 weeks of age, fasting glucose concentrations had decreased approximately 13% in both male and female G6pc2-null mice when compared with wild-type mice (44). This modest decrease in glucose concentration was observed despite the absence of any differences in body weight, fasting insulin, or fasting glucagon concentrations. The character-istics of these G6pc2-null mice closely paralleled our observations that rs560887 and rs563694 were associated with modest chang-es in fasting glucose but not in BMI or fasting insulin, which are consistent with the hypothesis that presence of a C allele results in lower G6PC2 expression and therefore lower glucose concentrations. Interestingly, G6pc2 mRNA levels appear to increase with increasing glucose concentration in isolated mouse islets (36).
Molecular cloning of G6pc2 identified 2 splice forms that differ by the presence or absence of exon 4 in BALB/C and ob/ob mice and in insulinoma tissue (45). The longer cDNA including exon 4 has approximately 50% homology with glucose-6-phosphatase cat-alytic subunit (G6pc) across a variety of species including humans and is membrane bound in the endoplasmic reticulum (46). The corresponding G6PC2 splice forms have been observed in human
pancreas (47). rs560887 is located in intron 3, just 26 bp proximal to exon 4, raising the possibility that this variant may play a role in whether the full-length transcript is formed.
G6PC hydrolyzes glucose-6-phosphate to form glucose and release a phosphate group. Despite its similarity to G6PC, G6PC2 is reported to have little to no hydrolase activity in humans (36, 37, 45, 46). In normal and genetically obese mice, the splice form lack-ing exon 4 appears to be the most predominant observed in islets (45) and lacks sequences that may be critical for hydrolytic activity (45, 48), suggesting the full-length form of G6pc2 may have impli-cations for activity of G6pc2 and its potential role in glucoregula-tion. Greater hydrolase activity has been reported in cell lines over-expressing the full-length form of G6PC2 (36). Also, in islets from streptozotocin-treated mice, glucose cycling, an indicator of G6pc2 activity, was approximately 3-fold higher compared with islets from untreated mice (49), and even greater increases were observed in islets from ob/ob mice (50, 51). The conversion of glucose to glu-cose-6-phosphate is the critical step in stimulus-secretion coupling for insulin secretion. Variation in G6PC2 may increase glucose cycling in β cells, resulting in altered generation of ATP, which would have implications for insulin secretion. In addition, G6PC2-induced alterations in β cell glucose metabolism would also have downstream effects on phosphoinositide 3-kinase activity, which regulates pancreas duodenum homeobox-1 (PDX1) binding to the insulin gene and subsequent insulin gene transcription (52).
The possible role for G6PC2 in altering glucose concentrations raises the question of whether this gene also confers susceptibil-ity to T2DM. We observed no association between fasting glucose and rs563694 and rs560887 in individuals with T2DM from the FUSION, DGI, and METSIM studies (P > 0.50). However, the anal-ysis of fasting glucose concentration in individuals with T2DM is confounded by diabetes pathology, treatment, and differential response to therapy. Therefore the lack of association with fast-ing glucose in individuals with T2DM does not preclude G6PC2 as contributing to susceptibility to T2DM. Similarly, when we tested these SNPs for association with T2DM in the FUSION, DGI, and METSIM samples, we observed no evidence for associa-tion (P > 0.08). Further, the modest effect on glucose concentra-tions observed in our analysis of nondiabetic individuals suggests we may lack sufficient power to detect association with T2DM. Whereas the cumulative evidence would suggest that G6PC2 may regulate fasting glucose concentrations and does not contribute significantly to susceptibility to T2DM, larger studies may be required to elucidate the role of this gene in T2DM susceptibility.
Variation in the promoter region of glucokinase (GCK, rs1799884) has been shown to be associated with fasting glucose and impaired insulin secretion (53–55) and may play a role in altering birth weight (56). These initial findings were confirmed in a comprehensive meta-analysis performed by Weedon et al., demonstrating that rs1799884 was associated with fasting glucose (meta P = 1.0 × 10–9) and that the presence of a maternal A allele for rs1799884 was associated with increased birth weight of the child (P = 0.02) (25). GCK, an enzyme that works counter to G6PC2, converts glucose to glucose-6-phos-phate, forming the critical step in secretion-stimulus coupling in pancreatic β cells. In addition, the recent GWA study from the DGI identified variation in glucokinase regulatory protein (GCKR) (rs780094) to be associated with triglyceride levels (13). GCKR is an allosteric regulator of GCK in both liver and pancreatic islets whose inhibitory effect is enhanced by fructose-6-phosphate and sup-pressed by fructose-1-phosphate (57). We found modest evidence
87
for association between fasting glucose and rs1799884 (FUSION stage 1, P = 1.6 × 10–2; SardiNIA, P = 2.0 × 10–3; meta P = 1.1 × 10–4) and no evidence for association between fasting glucose and rs780094 (FUSION stage 1, P = 0.44; SardiNIA, P = 0.11; meta P = 0.077). While these results provide evidence for association between varia-tion in GCK and fasting glucose but not between GCKR and fast-ing glucose in our studies, we cannot exclude the possibility that a complex interaction among GCK, GCKR, and G6PC2 may regulate fasting glucose levels. This will require further study.
In conclusion, we used GWA to identify variation in both ABCB11 and G6PC2 as genes that potentially contribute to variation in fasting glucose concentrations in nondiabetic subjects of mixed European descent. There is more literature with data supporting a role for G6PC2, but in the absence of functional data, we cannot discount the possibility that ABCB11 may also contribute signifi-cantly to variation in fasting glucose concentration. Heritability for fasting glucose has been estimated to be 25%–40% (19–24), yet the variants we identified account for approximately 1% of the variance in fasting glucose, indicating that the majority of the variability in fasting glucose remains unexplained. The remaining variability is likely due to the effects of additional common genetic variants of modest effect, less common genetic variants of mod-erate effect, and a variety of gene-gene and gene-environmental interaction effects. It should also be noted that the magnitude of the effect observed in our study is consistent with other reports of quantitative trait associations (58–60).
Additional studies, likely with larger sample sizes, will be required to identify additional genetic variants contributing to variation in fasting glucose. The variants identified in our study are not likely to be functional, but in LD with the functional variant(s). Additional fine mapping, sequencing, and functional studies will be required to define the molecular mechanisms underlying our observed association.
The FUSION and SardiNIA study samples and GWA genotyping have been described in detail (14, 24, 26–28). Here, we briefly review the study cohorts and genotyping methods. We also describe briefly each of the 7 follow-up samples. Subject demographics and basic clinical characteristics for indi-viduals genotyped for rs563694 for each sample are described below and summarized in Table 1. All protocols were approved by the institutional review boards or research ethics committees at the respective institutions, and informed consent was obtained from all subjects.
FUSION GWA study. The goal of the FUSION study is to identify genet-ic variants that predispose to T2DM or that determine the variability in T2DM-related quantitative traits. The study began as an affected sibling-pair family study (26, 27), later augmented by large numbers of cases and controls for association analysis (14). The FUSION GWA study was per-formed using a 2-stage case-control design (14). Cases and controls were approximately frequency matched on 5-year age category, sex, and birth province. All stage 1 DNA samples were genotyped using the Illumina HumanHap300 BeadChip version 1.0, resulting in data on 315,635 SNPs that passed quality control filters (14). Genotype data for an additional 2.09 million SNPs were estimated using an imputation procedure (61). The genotype imputation method uses stretches of chromosome shared between individuals genotyped at relatively low density in our studies and individuals genotyped in greater density by the International Hap-Map Consortium (61) to estimate the missing genotypes. Comparison of imputed and measured genotypes yielded estimated error rates of 1.46% (Illumina) to 2.14% (Affymetrix) per allele with an average concordance of
98.5%, consistent with expectations from HapMap data (61). SNPs show-ing promising association with fasting plasma glucose in the stage 1 sam-ples were genotyped in the stage 2 DNA samples by homogeneous MassEX-TEND reaction using the MassARRAY System (Sequenom) (14). Because treatment for T2DM affects fasting glucose concentrations, all analyses in this report were restricted to nondiabetic subjects. Diabetes status was confirmed by WHO criteria (62) or confirmation of treatment for diabe-tes by medical record review. Fasting plasma glucose concentrations were available for 1,233 stage 1 and 655 stage 2 samples. Additional FUSION samples included nondiabetic spouses or offspring from FUSION stage 1 families; fasting plasma glucose data were available for 578 individuals. These 578 samples were genotyped using the Applied Biosystems Taq-Man allelic discrimination assays (63) and yielded 522 samples with both genotype and fasting glucose data. These samples were integrated into the FUSION stage 1 samples and independently analyzed to assess whether the additional family members improved the evidence for association. We have denoted this analysis FUSION 1 families.
SardiNIA GWA study. The SardiNIA study is a longitudinal study of aging-related quantitative traits and comprises a cohort of 6,148 individuals 14 years or older recruited from 4 towns in the Lanusei Valley in Sardinia. Data from 4,350 individuals with fasting serum glucose measurements from this cohort were used for the GWA study; 3,331 were genotyped using the Affymetrix 10K SNP Mapping Array, and an additional 1,412 were genotyped using the Affymetrix 500K SNP Mapping Array (28). 356,359 SNPs passed quality control and were tested for association with fasting serum glucose. We first used the genotyped SNPs in the 1,412 individuals to estimate genotypes for all the polymorphic SNPs genotyped by the Hap-Map Consortium. Taking advantage of the relatedness among individuals in the SardiNIA sample, we then conducted a second round of computa-tional analysis to impute genotypes for analysis in the 2,938 individuals not genotyped with the 500K SNP Array. In this second round, we identi-fied large stretches of chromosome shared within each family and proba-bilistically “filled-in” genotypes within each stretch whenever 1 or more of its carriers was genotyped with the 500K Array Set (64, 65). For these analyses, 37 non-Sardinians and 281 of their family members (n = 318) and 177 individuals with known diabetes were excluded from the analysis, resulting in a final sample size of 3,855.
Follow-up samples. The initial association identified in the metaanalysis of the FUSION and SardiNIA GWA studies was also tested in a series of follow-up samples (Table 1), 1 from FUSION described above and 6 others, which are described briefly below.
DGI. The DGI case-control GWA sample consists of 1,464 cases with T2DM and 1,467 normoglycemic controls from Finland and Sweden and has been previously described in detail (13). Fasting glucose measurements were available for 1,455 nondiabetic control subjects (1,305 unrelated sub-jects and 150 siblings). Among these, fasting plasma glucose was measured in 537 subjects and fasting whole blood glucose was measured in 918 subjects. Whole-blood glucose concentrations were converted to equivalent plasma values using a conversion factor of 1.13 (66). All samples were genotyped using the Affymetrix GeneChip Human Mapping 500K Array set; results of GWA of 389,878 SNPs with fasting glucose levels (including SNP rs563694) are publicly available at www.broad.mit.edu/diabetes/scandinavs/index.html. 1,411 individuals were available with both rs563694 genotype and fasting glucose data.
Old Order Amish subjects. The Old Order Amish study participants report-ed here were 1,655 nondiabetic subjects from Lancaster, Pennsylvania, USA, for whom fasting plasma glucose measurements were available. These subjects were enrolled in ongoing family studies of complex diseases and traits (29–31). Genotyping for rs563694 was performed using the TaqMan allelic discrimination assay (63).
88
METSIM study. Subjects were selected from the ongoing METSIM study, which includes 7,000 men, aged 50 to 70 years, randomly selected from the population of the town of Kuopio, Eastern Finland, Finland (population 95,000). The present analysis is based on the first 4,386 non-diabetic subjects examined for METSIM with available fasting plasma glucose values. Genotyping was performed using the TaqMan allelic dis-crimination assay (63).
Caerphilly study. The Caerphilly study is a cohort study of white, Euro-pean men (n = 1,069; 97.4% born in the United Kingdom), aged 45–59 years at entry in 1979–1983 (32), recruited from the town of Caerphilly, United Kingdom, and 5 adjacent villages. Men were selected using the electoral role and general practitioner records. DNA and fasting plasma glucose measure-ments used in this study relate to the first phase of data collection.
BWHHS. The BWHHS consists of female participants, aged 60 to 79 years and recruited between April 1999 and March 2001. Initially, 4,286 women were randomly selected from 23 British towns and were interviewed and clinically examined. They also completed medical questionnaires (33).
Genotyping for the Caerphilly study and BWHHS was performed by KBioscience using their fluorescence-based competitive allele-specific PCR (KASPar) technology.
The Inter99 Study. rs563694 was genotyped in 5,734 Danes for whom fast-ing plasma glucose values were available. This sample comprises part of the population-based Inter99 sample of middle-aged people sampled at Research Centre for Prevention and Health (Glostrup, Denmark; refs. 34, 35). Geno-typing was performed using TaqMan allelic discrimination (KBioscience).
Statistics. Association between fasting glucose and genotypes in the FUSION and SardiNIA studies was carried out using a regression framework in which regression coefficients were estimated in the context of a variance compo-nent model to account for relatedness among individuals (65). For FUSION samples, plasma glucose concentration was adjusted for sex, age, age2, birth province, and study group. Analyses were carried out in nondiabetic individ-uals excluding those known to be taking medications that directly affect glu-cose concentration. Similarly, SardiNIA serum glucose values were adjusted for sex, age, and age2. Because diabetes-based exclusions were based only on medical records and SardiNIA only measured fasting serum glucose, a small number of undiagnosed new-onset diabetes cases may have been included in the analysis. For both studies, analyses were repeated including BMI as an additional covariate to assess whether adiposity significantly contributed to the evidence for association. Covariate-adjusted trait values were trans-formed to approximate univariate normality by applying an inverse normal scores transformation; the scores were ranked, ranks were transformed into quantiles, and quantiles were converted to normal deviates.
A weighted z score–based fixed effects metaanalysis method was used to combine results from the FUSION and SardiNIA studies. In brief, for each SNP, a reference allele was identified and a z statistic summarizing the mag-nitude of the P value for association and direction of effect was generated for each study. An overall z statistic was then computed as a weighted average of the individual statistics, and a corresponding P value for that statistic was computed. The weights were proportional to the square root of the num-ber of individuals in each study and scaled such that the squared weights summed to 1. For the metaanalysis of the effect size, the inverse variance was used as weights for each study. For the FUSION 1 families (FUSION stage 1 plus additional FUSION spouses and offspring) a regression-based analysis under a variance components framework was used to appropriately account for relationships among individuals (65). Because we did not have birth prov-ince information for the additional spouses and offspring, these analyses were carried out adjusting for age, age2, sex, and study group only.
Given the different sampling schemes, statistical analyses for the follow-up samples varied by study. The Old Order Amish samples consisted of large Amish pedigrees, so the evidence for association between genotype
and fasting plasma glucose was evaluated using variance components analysis implemented in SOLAR to adjust for the relatedness of study sub-jects (67, 68). Plasma glucose levels were natural logarithm transformed for analysis, and covariates included sex, age, and age2. For the DGI study, glu-cose values were converted to z scores separately by sex, and tests for associ-ation were carried out using a regression framework with age and log(BMI) included as covariates; genomic control was applied to account for related-ness (13). For the METSIM study, analyses were carried out identically as in FUSION, with the exception that birth province was not included as a covariate. For the Caerphilly and BWHHS studies, association was assessed using a regression framework with age, age2, and BMI as covariates. For the Inter99 study, association was assessed using a regression framework with age and sex as covariates. Individuals with known diabetes at the time of examination were excluded from the analyses. Results from all follow-up studies were combined in a metaanalysis as described above. Finally, a metaanalysis that combined results from all GWA and follow-up studies was performed as described above.
We would like to thank the many research volunteers who generous-ly participated in the various studies represented in this study. For the FUSION study, we also thank Peter S. Chines, Narisu Narisu, Andrew G. Sprau, and Li Qin for informatics and genotyping sup-port and the Center for Inherited Disease Research for the FUSION GWA genotyping. For the SardiNIA study, we thank the mayors of Lanusei, Ilbono, Arzana, and Elini, the head of local Public Health Unit ASL4, and the residents of the towns for their volunteerism and cooperation. In addition, we are grateful to the mayor and the administration in Lanusei for providing and furnishing the clinic site. We thank the team of physicians — Maria Grazia Pilia, Danilo Fois, Liana Ferreli, Marcello Argiolas, Francesco Loi, and Pietro Figus — and the nurses Paola Loi, Monica Lai, and Anna Cau, who carried out the physical examinations and made the observations.
We thank the former Medical Research Council (MRC) Epide-miology Unit (South Wales) who undertook the Caerphilly study. The Department of Social Medicine, University of Bristol, now acts as custodian for the Caerphilly database. We are grateful to all of the men who participated in this study. For the BWHHS, we thank all of the general practitioners and their staff who supported data collection and the women who participated in the study.
For the Amish studies, we thank members of the Amish com-munity for the generous donation of time to participate in these studies and our field nurses, Amish liaisons, and clinic staff for their extraordinary efforts. We also acknowledge Sandy Ott and John Shelton for genotyping of Amish DNA samples.
Support for this study was provided by the following: Ameri-can Diabetes Association (ADA) (1-05-RA-140 to R.M. Watanabe; 7-04-RA-111 to A.R. Shuldiner; and postdoctoral fellowships to C.J. Willer and H.M. Stringham); and NIH grants (DK069922 and U54 DA021519 to R.M. Watanabe; DK062370 to M. Boehn-ke; DK072193 to K.L. Mohlke; DK062418 to W-M. Chen; R01 DK54361, U01 HL72515, and R01 AG18728 to A.R. Shuldiner; R01 HL69313 to B.D. Mitchell; and R01 DK068495 to K.D. Sil-ver). D.A. Lawlor is funded by a UK Department of Health career scientist award, and N. Timpson is funded by a studentship from the MRC of the United Kingdom.
The Inter99 Study was supported by the European Union (EUGENE2, LSHM-CT-2004-512013); the Lundbeck Founda-tion Centre of Applied Medical Genomics in Personalized Dis-ease Prediction, Prevention and Care; the FOOD Study Group/
89
the Danish Ministry of Food, Agriculture and Fisheries and Ministry of Family and Consumer Affairs (2101-05-0044); and the Danish Medical Research Council.
This research was supported in part by the intramural Research Program of the NIH, National Institute on Aging, and the NIDDK. Additional support came from contract N01-AG-1-2109 from the NIA intramural research program for the SardiNIA (ProgeNIA) team; National Human Genome Research Institute intramural project number 1 Z01 HG000024 (to F.S. Collins); University of Maryland General Clinical Research Center (M01 RR 16500); Johns Hopkins University General Clinical Research Center (M01 RR 000052); the NIDDK Clinical Nutrition Research Unit of Maryland (P30 DK072488); and the Department of Veterans Affairs and Veter-ans Affairs Medical Center Baltimore Geriatric Research, Education and Clinical Center (GRECC). The BWHHS receives core funding from the United Kingdom Department of Health policy research program. The DNA extraction and genotyping for BWHHS were funded by the British Heart Foundation. The Caerphilly study was funded by the MRC of the United Kingdom. Funding for the Caer-philly DNA Bank was from an MRC grant (G9824960). The United
Kingdom MRC supports work undertaken in the Centre for Causal Analyses in Translational Epidemiology.
The views expressed in this paper are those of the authors and not necessarily those of any funding body or others whose support is acknowledged. Those providing funding had no role in study design, data collection and analysis, decision to publish, or prepara-tion of the manuscript.
Received for publication November 26, 2007, and accepted in revised form April 23, 2008.
Address correspondence to: Angelo Scuteri, Unità Operativa Geria-tria, Istituto Nazionale Ricovero E Cura Anziari, Rome, Italy. Phone: 39-3334564136; Fax: 39-06-30362896; E-mail: angeloelefante@ interfree.it. Or to: Richard M. Watanabe, Keck School of Medicine of USC, Department of Preventive Medicine, 1540 Alcazar St., CHP-220, Los Angeles, California 90089-9011, USA. Phone: (323) 442-2053; Fax: (323) 442-2349; E-mail: [email protected].
Wei-Min Chen and Michael R. Erdos are co–first authors.
1. Reaven, G.M. 1988. Role of insulin resistance in human disease. Diabetes. 37:1595–1607.
2. DeFronzo, R.A. 1987. The triumvirate: B-cell, muscle, liver. A collusion responsible for NIDDM. Diabetes. 37:667–687.
3. National Diabetes Data Group. 1979. Classification and diagnosis of diabetes mellitus and other catego-ries of glucose intolerance. Diabetes. 28:1039–1057.
4. [No authors listed]. 1985. Diabetes Mellitus: Report of a WHO Study Group. World Health Organ. Tech. Rep. Ser. 727:1–113.
5. The Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. 1997. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care. 20:1183–1197.
6. DeFronzo, R.A., and Ferrannini, E. 1991. Insulin resistance: A multifaceted syndrome responsible for NIDDM, obesity, hypertension, dyslipidemia, and atherosclerotic cardiovascular disease. Diabetes Care. 14:173–194.
7. Xiang, A.H., et al. 2006. Coordinate changes in plas-ma glucose and pancreatic β-cell function in Latino women at high risk for type 2 diabetes. Diabetes. 55:1074–1079.
8. Mason, C.C., Hanson, R.L., and Knowler, W.C. 2007. Progression to type 2 diabetes characterized by moderate then rapid glucose increases. Diabetes. 56:2054–2061.
9. Rich, S.S. 1990. Mapping genes in diabetes. Diabetes. 39:1315–1319.
10. Ghosh, S., and Schork, N.J. 1996. Genetic analysis of NIDDM: the study of quantitative traits. Diabetes. 45:1–14.
11. Diabetes Prevention Program Research Group. 2002. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 346:393–403.
12. Tuomilehto, J., et al. 2001. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N. Engl. J. Med. 344:1343–1350.
13. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, et al. 2007. Genome-wide asso-ciation analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 316:1331–1336.
14. Scott, L.J., et al. 2007. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 316:1341–1345.
15. Zeggini, E., et al. 2007. Replication of genome-wide association signals in U.K. samples reveals risk loci
for type 2 diabetes. Science. 316:1336–1341. 16. Sladek, R., et al. 2007. A genome-wide association
study identified novel risk loci for type 2 diabetes. Nature. 445:881–885.
17. Steinthorsdottir, V., et al. 2007. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat. Genet. 39:770–775.
18. Zeggini, E., et al. 2008. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40:638–645.
19. Beaty, T.H., and Fajans, S.S. 1982. Estimating genetic and non-genetic components of variance for fasting glucose levels in pedigrees ascertained through non-insulin dependent diabetes. Ann. Hum. Genet. 46:355–362.
20. Boehnke, M., Moll, P.P., Kottke, B.A., and Weid-man, W.H. 1987. Partitioning the variability of fasting plasma glucose levels in pedigrees. Am. J. Epidemiol. 125:679–689.
21. Sakul, H., et al. 1997. Familiality of physical and metabolic characteristics that predict the develop-ment of non-insulin-dependent diabetes mellitus in Pima Indians. Am. J. Hum. Genet. 60:651–656.
22. Watanabe, R.M., et al. 1999. Familiality of quantita-tive metabolic traits in Finnish families with non-insulin-dependent diabetes mellitus. Hum. Hered. 49:159–168.
23. Henkin, L., et al. 2003. Genetic epidemiology of insulin resistance and visceral adiposity. The IRAS family study design and methods. Ann. Epidemiol. 13:211–217.
24. Pilia, G., et al. 2006. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2:e132.
25. Weedon, M.N., et al. 2006. A common haplotype of the glucokinase gene alters fasting glucose and birth weight: Association in six studies and population-genetics analyses. Am. J. Hum. Genet. 79:991–1001.
26. Valle, T., et al. 1998. Mapping genes for NIDDM. Design of the Finland-United States Investigation of NIDDM Genetics (FUSION) Study. Diabetes Care. 21:949–958.
27. Silander, K., et al. 2004. A large set of Finnish affected sibling pair families with type 2 diabetes suggests susceptibility loci on chromosomes 6, 11, and 14. Diabetes. 53:821–829.
28. Scuteri, A., et al. 2007. Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet. 3:1200–1210.
29. Hsueh, W.-C., et al. 2001. Genome-wide scan of obesity in the Old Order Amish. J. Clin. Endocrinol. Metab. 86:1199–1205.
30. Sorkin, J., et al. 2005. Exploring the genetics of longevity in the Old Order Amish. Mech. Ageing Dev. 126:347–350.
31. Post, W., et al. 2007. Associations between genetic variants in the NOS1AP (CAPON) gene and cardiac repolarization in the Old Order Amish. Hum. Hered. 64:214–219.
32. The Caerphilly and Speedwell Collaborative Group. 1984. Caerphilly and Speedwell collaborative heart disease studies. J. Epidemiol. Community Health. 38:259–262.
33. Lawlor, D., Bedford, C., Taylor, M., and Ebrahim, S. 2003. Geographical variation in cardiovascular dis-ease, risk factors, and their control in older women: British Women’s Heart and Health Study. J. Epide-miol. Community Health. 57:134–140.
34. Jørgensen, M.E., et al. 2003. Obesity and central fat pattern among Greenland Inuit and a general pop-ulation of Denmark (Inter99): relationship to met-abolic risk factors. Int. J. Obes. Relat. Metab. Disord. 27:1507–1515.
35. Glümer, C., Jørgensen, T., Borch-Johnsen, K., and Inter99 study. 2003. Prevalences of diabetes and impaired glucose regulation in a Danish population: the Inter99 study. Diabetes Care. 26:2335–2340.
36. Petrolonis, A.J., et al. 2004. Enzymatic character-ization of the pancreatic islet-specific glucose-6-phosphatase-related protein (IGRP). J. Biol. Chem. 279:13976–13983.
37. Shieh, J.-J., Pan, C.-J., Mansfield, B.C., and Chou, J.Y. 2005. In islet-specific glucose-6-phosphatase-related protein, the beta cell antigenic sequence that is targeted in diabetes is not responsible for the loss of phosphohydrolase activity. Diabetologia. 48:1851–1859.
38. van Mil, S.W.C., et al. 2004. Benign recurrent intra-hepatic cholestasis type 2 is caused by mutations in ABCB11. Gastroenterology. 127:379–384.
39. Lang, C., et al. 2007. Mutations and polymorphisms in the bile salt export pump and the multidrug resistance protein 3 assocaited with drug-induced liver injury. Pharmacogenet. Genomics. 17:47–60.
40. Devlin, B., and Roeder, K. 1999. Genomic control for association studies. Biometrics. 55:997–1004.
41. Funk, C., Ponelle, C., Scheuermann, G., and Pantze, M. 2001. Cholestatic potential of troglitazone as a possible factor contributing to troglitazone-induced hepatotoxicity: in vivo and in vitro interac-
90
tion at the canalicular bile salt export pump (Bsep) in the rat. Mol. Pharmacol. 59:627–635.
42. Staels, B., and Kuipers, F. 2007. Bile acid seques-trants and the treatment of type 2 diabetes mellitus. Drugs. 67:1383–1392.
43. Mukherjee, R., Wagar, D., Stephens, T.A., Lee-Chan, E., and Singh, B. 2005. Identification of CD4+ T cell-specific epitopes of islet-specific glucose-6-phosphatase catalytic subunit-related protein: a novel beta cell autoantigen in type 1 diabetes. J. Immunol. 174:5306–5315.
44. Wang, Y., et al. 2007. Deletion of the gene encoding the islet-specific glucose-6-phosphatase catalytic subunit-related protein autoantigen results in a mild metabolic phenotype. Diabetologia. 50:774–778.
45. Arden, S.D., et al. 1999. Molecular cloning of a pan-creatic islet-specific glucose-6-phosphatase catali-ytic subunit-related protein. Diabetes. 48:531–542.
46. Shieh, J.-J., Pan, C.-J., Mansfield, B.C., and Chou, J.Y. 2004. The islet-specific glucose-6-phosphatase-related protein, implicated in diabetes, is a glyco-protein embedded in the endoplasmic reticulum membrane. FEBS Lett. 562:160–164.
47. Dogra, R.S., et al. 2006. Alternative splicing of G6PC2, the gene coding for the islet-specific glu-cose-6-phosphatase catalytic subunit-related pro-tein (IGRP), results in differential expression in human thymus and spleen compared with pancreas. Diabetologia. 49:953–957.
48. Pan, C.-J., Lei, K.-J., Annabi, B., Hemrika, W., and Chou, J.Y. 1998. Transmembrane topology of glu-cose-6-phosphatase. J. Biol. Chem. 273:6144–6148.
49. Khan, A., et al. 1990. Glucose cycling in islets from healthy and diabetic rats. Diabetes. 39:456–459.
50. Khan, A., et al. 1989. Evidence for the presence of glucose cycling in pancreatic islets of the ob/ob mouse. J. Biol. Chem. 264:9732–9733.
51. Khan, A., et al. 1990. Glucose cycling is markedly enhanced in pancreatic islets of obese hyperglycemic mice. Endocrinology. 126:2413–2416.
52. Vaulont, S., Vasseur-Cognet, M., and Kahn, A. 2000. Glucose regulation of gene transcription. J. Biol. Chem. 275:31555–31558.
53. Stone, L.M., Kahn, S.E., Deeb, S.S., Fujimoto, W.Y., and Porte, D., Jr. 1994. Glucokinase gene varia-tions in Japanese-Americans with a family history of NIDDM. Diabetes Care. 17:1480–1483.
54. Stone, L.M., Kahn, S.E., Fujimoto, W.Y., Deeb, S.S., and Porte, D., Jr. 1996. A variation at position -30 of the β-cell glucokinase gene promoter is associ-ated with reduced β-cell function in middle-aged Japanese-American men. Diabetes. 45:422–428.
55. Rose, C.S., et al. 2005. A -30G>A polymorphism of the beta-cell-specific glucokinase promoter associ-ates with hyperglycemia in the general population of whites. Diabetes. 54:3026–3031.
56. Weedon, M.N., et al. 2005. Genetic regulation of birth weight and fasting glucose by a common polymophism in the islet promoter of the glucoki-nase gene. Diabetes. 54:576–581.
57. Malaisse, W.J., Malaisse-Lagae, F., Davies, D.R., Vandercammen, A., and Van Schaftingen, E. 1990. Regulation of glucokinase by a fructose-1-phos-phate-sensitive protein in pancreatic islets. Eur. J. Biochem. 190:539–545.
58. Frayling, T.M., et al. 2007. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science.
316:889–894. 59. Sanna, S., et al. 2008. Common variants in the
GDF5-UQCC region are associated with variation in human height. Nat. Genet. 40:198–203.
60. Willer, C.J., et al. 2008. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat. Genet. 40:161–169.
61. Li, Y., Willer, C.J., Ding, J., Scheet, P., and Abecasis, G.R. 2008. Markov model for rapid haplotyping and genotype imputation in genome wide studies. Nat. Genet. In press.
62. [Anonymous]. 1999. Definition, diagnosis and classification of diabetes mellitus and its compli-cations. Report of a WHO Consultation. WHO. Geneva, Switzerland. www.diabetes.com.au/pdf/who_report.pdf.
63. Livak, K.J. 1999. Allelic discrimination using fluo-rogenic probes and the 5ʹ nuclease assay. Genet. Anal. 14:143–149.
64. Burdick, J.T., Chen, W.M., Abecasis, G.R., and Cheung, V.G. 2006. In silico method for inferring genotypes in pedigrees. Nat. Genet. 38:1002–1004.
65. Chen, W.-M., and Abecasis, G.R. 2007. Family based association tests for genome wide association scans. Am. J. Hum. Genet. 81:913–926.
66. D’Orazio, P., et al. 2005. Approved IFCC recom-mendations on reporting results for blood glucose (abbreviated). Clin. Chem. 51:1573–1576.
67. Blangero, J., and Almasy, L. 1997. Multipoint oligo-genic linkage analysis of quantitative traits. Genet. Epidemiol. 14:959–964.
68. Almasy, L., and Blangero, J. 1998. Multipoint quan-titative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62:1198–1211.
91
SUPPLEMENTARY MATERIAL
Supplemental Table I. Top 10 Independent* Genome-wide Associations with Fasting Glucose.
SNP Chromosome Position
FUSION Stage 1
p-value
SardiNIA
p-value
GWA Meta-analysis
p-value†
rs560887 2 169588655 1.7×10-3 4.4×10-8 2.8×10-10
rs9981885 21 19252508 2.7×10-2 1.2×10-6 3.2×10-7
rs1387153 11 92313476 8.5×10-3 3.6×10-5 1.0×10-6
rs2027281 10 29424184 3.0×10-2 2.1×10-5 1.8×10-6
rs420510 21 19226244 3.2×10-2 2.5×10-5 2.3×10-6
rs693793 6 124830968 8.3×10-3 1.1×10-4 3.1×10-6
rs2214108 7 11428183 4.9×10-2 2.8×10-5 3.9×10-6
rs7251204 19 20558498 3.9×10-2 3.7×10-5 4.0×10-6
rs11122355 1 228319928 0.78 3.9×10-7 5.3×10-6
rs6462079 7 28189067 8.6×10-6 6.8×10-3 5.5×10-6
* Defined as a pair-wise D'<0.8
92
Supplemental Table II. GWA for Fasting Glucose in Non-diabetic Individuals With Meta-
analysis p-value 1.0×10-4 on Chromosome 2.
FUSION Stage 1 SardiNIA
SNP Position** Minor Allele MAF p-value
Minor Allele MAF p-value
GWA Meta-analysis p-value†
rs477224 169575990 C 0.184* 8.9×10-2 C 0.361 3.4×10-4 7.7×10-5
rs13431652 169578922 C 0.301 6.9×10-3 C 0.412 2.4×10-5 5.6×10-7
rs573225 169583048 G 0.307 2.9×10-3 G 0.436 4.9×10-5 5.7×10-7
rs560887 169588655 T 0.305* 1.7×10-3 T 0.372* 4.4×10-8 2.8×10-10
rs563694 169599578 C 0.339* 8.1×10-4 C 0.455* 7.6×10-5 3.5×10-7
rs537183 169600153 C 0.340 8.6×10-4 C 0.455 6.5×10-5 3.1×10-7
rs502570 169600466 A 0.340 9.3×10-4 A 0.455 6.5×10-5 3.3×10-7
rs475612 169602253 T 0.343 1.1×10-3 T 0.451 5.7×10-5 3.3×10-7
rs557462 169603102 T 0.343 1.1×10-3 C 0.455 6.4×10-5 3.6×10-7
rs486981 169607656 A 0.343 1.6×10-3 A 0.469 1.0×10-4 7.9×10-7
rs484066 169607988 A 0.379 1.9×10-2 A 0.370 9.0×10-5 9.0×10-7
rs569805 169608387 A 0.342 1.6×10-3 A 0.469 1.0×10-4 7.9×10-7
rs579060 169608546 G 0.342 1.6×10-3 G 0.469 1.0×10-4 7.9×10-7
rs508506 169610462 A 0.342 1.6×10-3 A 0.469* 1.0×10-4 7.8×10-7
rs494874 169614813 T 0.342 1.6×10-3 T 0.470* 1.1×10-4 8.7×10-7
rs552976 169616945 A 0.342 1.6×10-3 A 0.472* 3.5×10-5 2.5×10-7
rs567074 169619938 T 0.442 8.0×10-2 C 0.446* 4.1×10-4 8.2×10-5
93
rs853789 169626995 A 0.350 1.4×10-3 A 0.412* 2.5×10-7 1.4×10-9
rs853787 169627759 G 0.352 1.1×10-3 G 0.412 2.3×10-7 1.0×10-9
rs862662 169627836 C 0.438 4.5×10-2 A 0.449 2.4×10-4 2.9×10-5
rs853781 169631828 A 0.459 2.9×10-2 G 0.449 2.4×10-4 1.9×10-5
rs853773 169639854 A 0.482 2.7×10-2 G 0.480 3.8×10-6 3.2×10-7
* Based on genotyped data
** Based on NCBI build 35
† Meta-analysis for FUSION stage 1 and SardiNIA
94
Supplemental Table III. Association Between rs563694 and Fasting Glucose in
Non-diabetic Individuals Adjusting for SNPs Associated with Type 2 Diabetes
from Previous GWA Studies*.
Chr
Covariate
SNP Gene
FUSION
p-value
SardiNIA
p-value
Meta
p-value
None 8.0 10-4 7.6 10-5 3.5 10-7
3 rs4402960 IGF2BP2 6.7×10-4 8.1×10-5 3.3 10-7
3 rs1801282 PPARG 8.2×10-4 7.5×10-5 3.5 10-7
6 rs7754840 CDKAL1 8.1×10-4 7.8×10-5 3.6 10-7
8 rs13266634 SLC30A8 1.1×10-3 6.3×10-5 3.6 10-7
9 rs10811661 CDKN2A/2B 1.1×10-3 6.4×10-5 3.7 10-7
10 rs1111875 HHEX 1.1×10-3 7.1×10-5 4.1 10-7
10 rs7903146 TCF7L2 1.1×10-3 6.8×10-5 3.9 10-7
11 rs9300039 Chr 11 intragenic 6.1×10-4 7.5×10-5 2.8 10-7
11 rs5215 KCNJ11 7.1×10-4 7.1×10-5 3.0 10-7
16 rs8050136 FTO 1.1×10-3 7.3×10-5 4.2 10-7
* The direction of the effect of rs563694 on fasting glucose was not altered by
the addition of these SNPs as covariates.
95
Supplemental Table IV. Association Between Fasting Glucose in Non-diabetic
Individuals and SNPs Associated with Type 2 Diabetes from Previous GWA
Studies.
Chr SNP Gene
FUSION
p-value
SardiNIA
p-value
Meta
p-value
3 rs4402960 IGF2BP2 0.453 0.690 0.751
3 rs1801282 PPARG 0.713 0.983 0.160
6 rs7754840 CDKAL1 0.780 0.666 0.805
8 rs13266634 SLC30A8 0.728 0.738 0.029
9 rs10811661 CDKN2A/2B 0.180 0.794 0.060
10 rs1111875 HHEX 0.360 0.538 0.582
10 rs7903146 TCF7L2 0.816 0.530 0.597
11 rs9300039 Chr 11 intragenic 0.298 0.901 0.851
11 rs5215 KCNJ11 0.187 0.596 0.148
16 rs8050136 FTO 0.810 0.541 0.582
96
Chapter 5
Common variant in MTNR1B associated with increased risk of type 2 diabetes and impaired early insulin secretion
Nature Genetics2009;41(1):82-8
97
98
Common variant in MTNR1B associated withincreased risk of type 2 diabetes and impaired earlyinsulin secretionValeriya Lyssenko1, Cecilia L F Nagorny2, Michael R Erdos3, Nils Wierup4, Anna Jonsson1, Peter Spegel2,Marco Bugliani5, Richa Saxena6,7, Malin Fex8, Nicolo Pulizzi5, Bo Isomaa9, Tiinamaija Tuomi9,10,Peter Nilsson11, Johanna Kuusisto12, Jaakko Tuomilehto13–15, Michael Boehnke16, David Altshuler6,7,Frank Sundler4, Johan G Eriksson17,18, Anne U Jackson16, Markku Laakso12, Piero Marchetti5,Richard M Watanabe19,20, Hindrik Mulder2 & Leif Groop1,10
Genome-wide association studies have shown that variation inMTNR1B (melatonin receptor 1B) is associated with insulinand glucose concentrations. Here we show that the riskgenotype of this SNP predicts future type 2 diabetes (T2D) intwo large prospective studies. Specifically, the risk genotypewas associated with impairment of early insulin response toboth oral and intravenous glucose and with faster deteriorationof insulin secretion over time. We also show that the MTNR1BmRNA is expressed in human islets, and immunocytochemistryconfirms that it is primarily localized in b cells in islets.Nondiabetic individuals carrying the risk allele and individualswith T2D showed increased expression of the receptor inislets. Insulin release from clonal b cells in response toglucose was inhibited in the presence of melatonin. Thesedata suggest that the circulating hormone melatonin, whichis predominantly released from the pineal gland in the brain,is involved in the pathogenesis of T2D. Given the increasedexpression of MTNR1B in individuals at risk of T2D, thepathogenic effects are likely exerted via a direct inhibitoryeffect on b cells. In view of these results, blocking themelatonin ligand-receptor system could be a therapeuticavenue in T2D.
T2D incidence and prevalence are increasing at an alarming rateworldwide. It is well established that T2D is multifactorial and thatmultiple genes and environmental and behavioral factors combine tocause the disease. Recent genome-wide association studies (GWAS)have provided new insights into the nature of these genetic factors1–5.Many of the T2D-associated variants identified in these studies seemto influence the capacity of b cells to cope with increased insulindemands imposed by insulin resistance. One of the GWAS (DiabetesGenetics Inititative; DGI) also provided information on associationwith 18 quantitative traits1, including measures of insulin secretionand action. One of the strongest signals for glucose-stimulated insulinsecretion in the DGI scan emanated from a SNP (rs10830963) inMTNR1B on chromosome 11 (P ¼ 7 � 10�4, rank order 595). Giventhat the melatonin pathway had previously been suggested to beinvolved in pathogenesis of T2D, the MTNR1B gene was a primecandidate gene for T2D. This SNP was also strongly associated(P ¼ 3.2 � 10�50) with elevated fasting glucose concentrations in ameta-analysis of the recent GWAS of T2D6.Melatonin is a circulating hormone predominantly secreted
from the pineal gland, although other endocrine cell systems mayalso synthesize and release this hormone7, which then could exerthitherto unknown autocrine and paracrine effects8. Melatonin is an
Received 11 July; accepted 27 October; published online 7 December 2008; doi:10.1038/ng.288
1Unit of Diabetes and Endocrinology, Department of Clinical Sciences in Malmoe, Lund University Diabetes Centre, University Hospital, Malmoe 20520, Sweden. 2Unitof Molecular Metabolism, Department of Clinical Sciences in Malmoe, Lund University Diabetes Centre, Malmoe 20502, Sweden. 3Genome Technology Branch,National Human Genome Research Institute, Bethesda, Maryland 20892, USA. 4Unit of Neuroendocrine Cell Biology, Department of Experimental Medical Science,Lund University, Lund 22184, Sweden. 5Department of Endocrinology and Metabolism, University of Pisa, Pisa 56124, Italy. 6Program in Medical and PopulationGenetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA. 7Massachusetts General Hospital, Boston,Massachusetts 02114, USA. 8Unit for Diabetes and Celiac Disease, Department of Clinical Sciences in Malmoe, Lund University Diabetes Centre, Malmoe 20502,Sweden. 9Folkhalsan Research Centre, Helsinki 00251, Finland. 10Department of Medicine, Helsinki University Central Hospital, and Research Program of MolecularMedicine, University of Helsinki, Helsinki 00140, Finland. 11Department of Clinical Sciences, Medicine, Lund University, Malmoe 20502, Sweden. 12Department ofMedicine, University of Kuopio and Kuopio University Hospital, Kuopio 70210, Finland. 13Diabetes Unit, Department of Health Promotion and Chronic DiseasePrevention, National Public Health Institute, Helsinki 00300, Finland. 14Department of Public Health, University of Helsinki, Helsinki 00014, Finland. 15SouthOstrobothnia Central Hospital, Senajoki 60220, Finland. 16Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan48109, USA. 17National Public Health Institute, Helsinki 00300, Finland. 18Department of General Practice and Primary Health Care, University of Helsinki, Helsinki00014, Finland. 19Department of Preventive Medicine and 20Department of Physiology and Biophysics, Keck School of Medicine, University of Southern California,Los Angeles, California 90033, USA. Correspondence should be addressed to L.G. ([email protected]).
99
indoleamine formed from tryptophan via acetylation and subsequentmethylation of the neurotransmitter serotonin. It has primarily beenimplicated in the regulation of circadian rhythms, and circulatinglevels of the hormone are high during night and drop duringdaylight7. In fact, it has been proposed that melatonin could beinvolved in a circadian lowering of nocturnal insulin levels9. Effectsof melatonin are mediated by two distinct receptors, MTNR1A andMTNR1B10, which are members of the G protein–coupled receptorfamily, specifically inhibitory G proteins (Gi). Both receptors havebeen found to be expressed in human and rodent islets11, withMTNR1A predominating, especially in glucagon-producing a cells12.There is some evidence that melatonin may exert an effect on insulinsecretion, in that acute effects exerted by cAMP-elevating agents areinhibited by melatonin, whereas prolonged effects of the hormonemay be stimulatory7. Here we provide new evidence that the commonvariant rs10830963 in the MTNR1B gene—or a variant(s) in linkagedisequilibrium with it—increases risk of future T2D by causingimpaired early insulin secretion. Further, we present functional datathat suggest a potential role of the melatonin system, in particular theMTNR1B receptor, in regulation of glucose homeostasis in man.First, we studied whether the MTNR1B rs10830963 SNP predicts
future T2D in 16,061 Swedish (from the Malmoe Preventive Project,MPP) and 2,770 Finnish (from the Botnia study) subjects, 2,201 (2,063+ 138) of whom developed diabetes during a median follow-up periodof 23.5 years (Table 1). The frequency of the risk G allele of SNPrs10830963 was higher in individuals from the MPP study whoconverted to T2D compared to nonconverters (30.2% versus 28.0%,P ¼ 0.002). This yielded a modestly increased risk of 1.12 (95% CI ¼1.04–1.20, P ¼ 0.002). There was no significant difference between
converters and nonconverters in the Botniastudy, but here only 138 individuals developedT2D during a 7-year follow-up period (31.0%versus 29.3%; OR ¼ 1.09, 95% CI ¼ 0.82–1.43, P ¼ 0.56). In the combined analysis ofthe two cohorts, the risk allele was associatedwith a 1.11-fold increased risk of future T2D(95% CI ¼ 1.03–1.18, P ¼ 0.004). Thisrelatively modest risk for future T2D probablyexplains why this SNP was not identified asbeing associated with T2D in previous GWAS(OR ¼ 1.12 (95% CI ¼ 1.04–1.20), P ¼ 0.003
in DIAGRAM). However, the effect on glucose levels seems muchstronger; in nondiabetic individuals from the MPP study,rs10830963[G] carriers had a higher fasting plasma glucose concentra-tion at baseline (CC: 5.38 ± 0.54 mmol/l, CG: 5.44 ± 0.55 mmol/l,GG 5.50 ± 0.55 mmol/l, P ¼ 3 � 10�19), which remained elevatedthroughout the 25-year follow-up period (CC: 5.41 ± 0.54 mmol/l, CG:5.49 ± 0.54 mmol/l, GG 5.55 ± 0.54 mmol/l, P ¼ 2 � 10�31) (Fig. 1a).Next, we examined insulin secretion in 3,300 nondiabetic partici-
pants from the population-based Botnia PPP study. We observed adose-dependent decrease (corrected early insulin response to glucose(CIR): beta ¼ –0.170 ± 0.021, P ¼ 5 � 10�16; disposition index (DI):beta ¼ –0.241 ± 0.022, P ¼ 1 � 10�26) with increasing number ofG alleles of rs10830963 (Table 2 and Fig. 1b,c). These findings werereplicated in the METabolic Syndrome In Men (METSIM) study,where both CIR (beta ¼ –0.143 ± 0.022, P ¼ 1 � 10�10) and DI(beta ¼ –0.128 ± 0.022, P ¼ 9 � 10�9) were associated withrs10830963 in 4,257 subjects.In the Botnia prospective study, 2,328 nondiabetic carriers of
rs10830963[G] showed lower insulin secretion at baseline (CIR:beta ¼ –0.160 ± 0.026, P ¼ 6 � 10�10; DI: beta ¼ –0.171 ± 0.026,P ¼ 9 � 10�11), which was maintained lower throughout the 7-yearfollow-up period (CIR: beta¼ –0.188 ± 0.026, P ¼ 1� 10�12; DI: beta¼ –0.179 ± 0.029, P ¼ 8 � 10�10) (Fig. 1d). Further, rs10830963[G]was also associated with impaired insulin secretion during an intrave-nous glucose tolerance test in 505 nondiabetic individuals from theBotnia study (FPIR: beta ¼ –0.065 ± 0.023, P ¼ 0.004; Fig. 1e).rs10830963[G] was also associated with reduced acute insulinresponse to glucose (AIR: P ¼ 2.2 � 10�6; DI: P ¼ 5.0 � 10�3) in522 nondiabetic individuals from the FUSION study13 (Table 2).
50
100
150
200
250
CIR
(m
U ×
l/m
mol
2 )
300 P < 0.0001
CC CG GG
b
10,000
20,000
30,000
40,000
DI (
mU
3 /L3 )
50,000 P < 0.0001
CC CG GG
c
5.35Baseline Follow-up
5.40
5.45
5.50
CCCGGG5.55
5.60
Fast
ing
gluc
ose
(mm
ol/l)
P < 0.0001a
120
150
180
210
240
270
300
330
PF
IR (
mU
/l)
P = 0.004
CC CG GG
e
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
Inta
ct p
roin
sulin
/insu
lin(m
U/l)
P = 0.005
CC CG GG
f
DI (
mU
3 /l3 )
30,000P < 0.0001
28,000
26,000
24,000
22,000
20,000
18,000
16,000
14,000Baseline Follow-up
dCCCGGG
Table 1 Samples used in this study
Study N (with diabetes) Geographic origin Age (y) BMI (kg/m2)
Malmoe Preventive Project (MPP) 16,061 (2,063) Sweden 45.5 ± 6.9 24.3 ± 3.3
Botnia PPP 3,300 Finland 48.5 ± 15.9 26.1 ± 4.2
Botnia prospective cohort 2,770 (138) Finland 44.9 ± 14.2 25.6 ± 4.1
Helsinki Birth Cohort 1,600 Finland 61.6 ± 3.0 27.1 ± 4.3
FUSION 522 Finland 39.1 ± 12.2 26.0 ± 6.4
METSIM 4,369 Finland 59.3 ± 2.8 26.9 ± 3.8
Data are shown as mean ± s.d.
Figure 1 Insulin secretion according to different
MTNR1B rs10830963 genotypes. (a) Change in
fasting plasma glucose concentrations during
24-year follow-up in nondiabetic subjects
(Malmoe study, N = 13,674). (b) Corrected early
insulin response to glucose (CIR) during OGTT
(Botnia PPP cohort; N = 3,300). (c) Disposition
index (DI) represents early insulin response to
glucose corrected for insulin sensitivity by the
Matsuda index (CIR � ISI, Botnia PPP cohort;
N = 3,300). (d) Change in insulin secretion
(disposition index) over time in nondiabetic
subjects (Botnia prospective cohort, N = 2,328).
(e) Insulin secretion measured as first-phase
insulin response during an IVGTT (Botnia cohort;N = 505). (f) Intact proinsulin-to-insulin ratio
in the fasting state (Helsinki Birth Cohort,
N = 1,600). Bars represent mean ± s.e.m.
Blue lines represent nonrisk and red lines risk
genotype carriers of rs10830963 in MTNR1B.
100
Finally, we examined whether the SNP would influence proinsulinprocessing as reflected in the ratio between proinsulin and insulin in1,600 nondiabetic participants of the Helsinki Birth Cohort Study14.Also here, carriers of the MTNR1B risk genotype had impaired earlyinsulin response to oral glucose (CIR: beta ¼ –0.109 ± 0.027, P ¼ 5 �10�5; DI: beta ¼ –0.122 ± 0.027, P ¼ 8 � 10�6; Table 2). In addition,risk allele carriers had an elevated intact proinsulin-to-insulin ratio(P¼ 0.005; Table 2 and Fig. 1f). However, an increased proinsulin-to-insulin ratio does not a priori imply a specific defect in proinsulinprocessing, as proinsulin concentrations rise under most conditions ofstressed b cells.
The melatonin 1 B receptor (MTNR1B) is expressed in humanislets and in b cells. Using quantitative RT-PCR (Taqman), weobserved that both MTNR1A and MTNR1B were expressed inhuman islets as well as in clonal b cells. In contrast to previousfindings11,12, both receptors were expressed at near equal level inhuman islets. Moreover, islet expression of MTNR1B was con-firmed by immunocytochemistry (Fig. 2). Again, in contrast to aprevious report, where single-cell PCR identified MTNR1A mRNAprimarily in a cells12, we observed expression of MTNR1Bpredominantly in b cells in both human and rodent islets(Fig. 2). MTNR1A was also observed in islets; its expression was
Table 2 Effect of the MTNR1B rs10830963 on insulin secretion in the studied cohorts
Genotypes Additive model
Study Phenotype CC CG GG RA BETA s.e.m. P value
DGI WGAS (OGTT n ¼ 1,020) Age (y) 59 ± 10 59 ± 10 58 ± 10 – 0.74
BMI (kg/m2) 26.5 ± 3.6 26.7 ± 4.0 27.3 ± 3.7 – 0.14
Fasting P-glucose (mmol/l) 5.28 ± 0.53 5.32 ± 0.52 5.38 ± 0.60 0.31 0.045 0.022 0.039
CIR (mU � l/mmol2) 180 ± 360 165 ± 1,912 144 ± 163 –0.166 0.048 7 � 10–4
DI (mU3/l3) 24,036 ± 29,445 20,285 ± 27,763 16,555 ± 22,974 –0.173 0.046 2 � 10–4
Botnia PPP (OGTT n ¼ 3,300) CC CG GG
Age (y) 48.3 ± 16.0 48.5 ± 15.9 49.6 ± 15.9 – 0.38
BMI (kg/m2) 26.12 ± 4.21 26.22 ± 4.22 26.19 ± 3.82 – 0.79
Fasting P-glucose (mmol/l) 5.06 ± 0.54 5.25 ± 0.55 5.28 ± 0.55 0.30 0.134 0.014 2 � 10–22
CIR (mU � l/mmol2) 271 ± 415 205 ± 245 175 ± 134 –0.170 0.021 5 � 10–16
DI (mU3/l3) 44,631 ± 87,537 30,499 ± 49,947 24,316 ± 21,582 –0.241 0.022 1 � 10–26
Botnia prospective (OGTT n ¼ 2,328) Baseline CC CG GG
Age (y) 45.8 ± 13.2 45.1 ± 13.8 45.6 ± 14.2 – 0.52
BMI (kg/m2) 25.5 ± 4.1 25.7 ± 3.7 25.7 ± 3.8 – 0.48
Fasting P-glucose (mmol/l) 5.47 ± 0.57 5.55 ± 0.57 5.64 ± 0.54 0.29 0.081 0.019 1 � 10–5
CIR (mU � l/mmol2) 176 ± 183 150 ± 164 129 ± 137 –0.160 0.026 6 � 10–10
DI (mU3/l3) 26,958 ± 34,304 22,340 ± 31,320 18,375 ± 17,416 –0.171 0.026 9 � 10–11
Follow-up CC CG GG
Age (y) 53.8 ± 13.8 52.7 ± 14.3 53.3 ± 14.9 – 0.25
BMI (kg/m2) 26.5 ± 4.1 26.7 ± 4.2 26.7 ± 4.2 – 0.41
Fasting P-glucose (mmol/l) 5.25 ± 0.56 5.34 ± 0.56 5.41 ± 0.61 0.086 0.019 5 � 10–6
CIR (mU � l/mmol2) 234 ± 238 188 ± 192 145 ± 125 –0.188 0.026 1 � 10–12
DI (mU3/l3) 27,508 ± 40,934 20,888 ± 27,012 16,502 ± 16,261 –0.179 0.029 8 � 10–10
CC CG GG
Helsinki Birth Cohort (OGTT n ¼ 1,600) Age (y) 61.6 ± 3.0 61.5 ± 3.0 61.6 ± 3.1 – – 0.96
BMI (kg/m2) 27.0 ± 4.2 27.2 ± 4.4 27.1 ± 4.2 – – 0.53
Fasting P-glucose (mmol/l) 5.41 ± 0.55 5.55 ± 0.56 5.59 ± 0.53 0.34 0.096 0.019 3 � 10–7
CIR (mU � l/mmol2) 209 ± 196 175 ± 150 177 ± 188 –0.109 0.027 5 � 10–5
DI (mU3/l3) 19,646 ± 21,504 15,552 ± 15,063 15,699 ± 17,881 –0.122 0.027 8 � 10–6
Intact proinsulin/insulin 0.51 ± 0.26 0.52 ± 0.26 0.55 ± 0.24 0.024 0.009 0.005
METSIM (n ¼ 4,257) Age (y) 59.3 ± 5.8 59.4 ± 5.8 59.1 ± 5.7 0.36 – – –
BMI (kg/m2) 26.9 ± 3.9 26.9 ± 3.7 26.5 ± 3.7 –0.058 0.020 4.3 � 10–3
Fasting P-glucose (mmol/l) 5.6 ± 0.5 5.7 ± 0.5 5.8 ± 0.5 0.165 0.022 9.4 � 10–14
CIR (mU � l/mmol2) 196 ± 212 168 ± 165 152 ± 143 –0.143 0.022 1.3 � 10–10
DI (mU3/l3) 21,554 ± 28,426 17,878 ± 18,235 16,798 ± 16,461 –0.128 0.022 9.8 � 10–9
Botnia (IVGTT n ¼ 505) CC CG GG
FPIR 297 ± 195 259 ± 194 237 ± 139 0.27 –0.065 0.023 0.004
FUSION (FSIGT n ¼ 522) AIR (pM � 8 min) 2,632 ± 1,731 2,064 ± 1,468 1,554 ± 1,092 0.35 –0.316 0.067 2 � 10–6
Data are shown as means ± s.d. CIR, corrected early insulin response to glucose during OGTT; DI, disposition index; FPIR, first-phase insulin response during IVGTT; AIR, acuteinsulin response during frequently sampled intravenous glucose tolerance test (FSIGT); RA, risk allele.
101
less abundant and seemed to be restricted to a population ofperipherally located b cells in human, mouse and rat islets.
Next, we analyzed whether islet expression of MTNR1B, which wenow had established in b-cells, correlated with presence ofrs10830963[G] in the MTNR1B gene as well as with T2D. To thisend, we used both quantitative RT-PCR and microarray. Using RT-PCR, we found that individuals carrying the G allele showed higherexpression of MTNR1B as compared with carriers of the C allele (age-adjusted P ¼ 0.01, Fig. 3a). Notably, this effect was almost exclusivelyseen in individuals older than 45 years (P ¼ 0.001, Fig. 3a insert). Themicroarray experiments (Affymetrix HU 133) were done on isletsisolated from four nondiabetic and four T2D islet donors15. There wasa trend toward higher expression of MTNR1B in T2D than innondiabetic islets (P ¼ 0.20, Supplementary Fig. 1a online), andexpression correlated inversely with glucose-stimulated insulin secre-tion (Supplementary Fig. 1b).
To determine the effects of melatonin on insulin secretion, weacutely incubated clonal b cells (832/13) at low and high glucoseconcentrations in the presence of 0.1 mM melatonin. Addition ofmelatonin exerted a clear inhibitory effect on insulin secretionprovoked by glucose (Fig. 3b).The present findings provide strong support for a role of melatonin
and its receptor MTNR1B in the pathogenesis of T2D. A commonvariant in the MTNR1B receptor was associated with an increase infasting glucose over time and predicted future T2D, most likelythrough impairment of insulin secretion from the pancreatic b-cellfunction7. Notably, this effect became more pronounced with increas-ing age, most likely as a consequence of the increased demandsimposed by increased age-related insulin resistance. This effect canbe understood in light of what is known about the function ofmelatonin in islets based on previous studies as well as our presentresults. The MTNR1B is coupled to an inhibitory G protein10.Activation of MTNR1B by melatonin would therefore block activationof adenylate cyclase, which is the predominant mode of action forincretin hormones, such as GLP-1 and glucose-dependent insulino-tropic polypeptide (GIP), both of which raise intracellular cAMP.There is also evidence supporting that glucose stimulation of the b cell
by itself leads to a rise in intracellular cAMP. Indeed, it has previouslybeen observed that addition of melatonin blocks cAMP formation in bcells16. Here, we confirmed previous observations, although discrepantresults have been reported12, that melatonin acutely blocks glucose-induced insulin secretion7. Thus, in a situation where expression ofMTNR1B is increased, it could be anticipated that cellular cAMP levelswill be lower. Hence, the potentiating effect that this nucleotide exertson insulin secretion, via mechanisms both dependent on and inde-pendent of protein kinase A, would be diminished, leading toimpaired insulin secretion. This potential pathogenic situationwould be further aggravated if melatonin levels are elevated. In fact,this seems to be the case: studies have reported that the circadianrhythm in melatonin secretion is perturbed in T2D17. It hasbeen suggested that secretion of the hormone is elevated during theday, when it normally should be low, which could lead to reducedinsulin secretion.There are therapeutic implications of our findings. First, if melato-
nin has a negative role in the development of T2D, antagonists of thereceptors targeted to b cells could be of utility. Second, individuals withthe risk profile conferred by the MTNR1B rs10830963 SNP may be lessresponsive to treatment with GLP-1 analogs as well as inhibitors ofGLP-1 degradation (DPP-IV inhibitors). Identifying these individualsmay allow tailoring of a more precise therapy in T2D.Our findings lend support to earlier reports of a role of the
melatonin system for islet function and also provide new insightsinto the mechanisms by which the system may play a role in thepathogenesis of T2D. Interfering with its action may be a newtherapeutic avenue in T2D.
METHODSStudy populations. In the Malmoe Preventive Project (MPP), 33,346 Swedish
subjects (22,444 men and 10,902 women; mean age 49 years, 24.5% with
0.5
1.0
1.5
2.0
2.5
MT
NR
1B e
xpre
ssio
n
3.0
3.5
4.0
4.5
0.5CC CG GG
CC CG GG
1.52.5
4.53.5
MT
NR
1Bex
pres
sion
Age > 45 yrsP = 0.001
5.5
0
2.8
16.7
16.7
+ 0
.1 μM
mela
tonin
5
10
15
20
Insu
lin n
g/m
g pr
otei
n/h
25
30
35
40
45
*a b
Figure 3 Expression of MTNR1B in human pancreatic islets. (a) The
MTNR1B mRNA levels were higher in risk GG genotype carriers (total
n ¼ 51, CC ¼ 21, CG ¼ 25, GG ¼ 5; nonadjusted P ¼ 0.25, age-adjusted
P ¼ 0.01). The insert graph shows expression of the MTNR1B mRNA levels
in the individuals above mean age of 45 years (total n ¼ 25, CC ¼ 10,
CG ¼ 13, GG ¼ 2; P ¼ 0.001): the MTNR1B mRNA levels were higher in
risk GG genotype carriers. (b) Insulin secretion in INS-1 832/13 clonal
b-cells in response to stimulation with 2.8 mM (gray bar) and 16.7 mM
glucose (white bar) in with the presence and absence of 0.1 mM melatonin(black bar). Individual experiments were done in triplicate (n ¼ 7,
*P o 0.037). Bars represent mean ± s.e.m.
Mouse
MTNR1B
Insu
linMerge
d
Rat Human
Figure 2 Colocalization of MTNR1B and insulin protein in mouse, rat and
human pancreatic islets. Scale bar, 50 mm.
102
impaired fasting (IFG) and/or impaired glucose tolerance (IGT)) from the city
of Malmoe in southern Sweden participated in a health screening during 1974–
1992 (ref. 18). All individuals underwent a physical examination and blood was
drawn for measurements of fasting blood glucose and lipid concentrations. In
addition, 18,900 consecutively enrolled persons also had an oral glucose
tolerance test (OGTT). Information on lifestyle factors and medical history
was obtained by questionnaire. Of individuals participating in the initial
screening 4,931 are deceased and 551 are lost from follow-up. Of the eligible
individuals, 25,000 were invited to a rescreening visit during 2002–2006, which
included a physical examination and fasting blood samples for measurements
of plasma glucose and lipids. Of the invited subjects, 17,284 persons partici-
pated in the rescreening. Of them 1,223 were excluded because of lacking
information or DNA (or T2D at baseline)19. Thereby, 16,061 nondiabetic
subjects, 2,063 of whom developed T2D, were included in the current analyses.
Diagnosis of diabetes was confirmed from subject records or on the basis of a
fasting plasma glucose concentration greater than 7.0 mmol/l.
The Botnia study started in 1990 at the west coast of Finland aiming at
identification of genes’ increasing susceptibility to T2D in members from
families with T2D. The prospective part included 2,770 nondiabetic family
members and/or their spouses (1,263 men and 1,507 women, mean age
45 years), 138 of whom developed T2D during a 7.7 year (median) follow-
up period19–21. All subjects were given information about exercise and healthy
diet and exposed at 2- to 3-year intervals to a new OGTT.
Prevalence, Prediction and Prevention of T2D (PPP Botnia) study is a
population-based study in the Botnia region which included approximately
10% of the population aged 18–74 years (mean age 51 ± 17 years.) Diagnosis of
diabetes was confirmed from subject records or on the basis of a fasting plasma
glucose concentration greater than 7.0 mmol/l and/or 2 h glucose greater than
11.1 mmol/l. Of the nondiabetic individuals, 2,328 also had serum insulin
concentrations measured at baseline and during follow-up.
The Finland–United States Investigation of Non-insulin-dependent Diabetes
Mellitus Genetics (FUSION) study has been described in detail2,13. For this
study 578 nondiabetic spouses or offspring were included in the study
of insulin response to intravenous glucose using tolbutamide-modified fre-
quently sampled intravenous glucose tolerance tests (FSIGTs)22,23 and ana-
lyzed by the Minimal Model method24 to derive quantitative measures of
insulin sensitivity (SI) and glucose effectiveness (SG). Insulin secretion
was assessed as the acute insulin response to glucose (AIR) as described by
Ward et al., and beta-cell function was assessed using the disposition index
(DI ¼ SI � AIR)25.
The Helsinki Birth Cohort Study (HBCS) has been previously described. In
the present study, 1,600 nondiabetic subjects (698 men and 902 women, mean
age 62 ± 3 years) were included14. In 2001–2004 all subjects participated in a
clinical examination, including a standard 75 g OGTT. Intact proinsulin
concentration was measured at 0 min and the fasting proinsulin/insulin ratio
(PI/I) was calculated.
The METabolic Syndrome In Men (METSIM) study includes men aged
45–70 years, randomly selected from the population of the town of Kuopio,
Eastern Finland, Finland (population 95,000). The present analysis is based on
the first 4,386 nondiabetic subjects examined for METSIM with available
OGTT data. Samples for the OGTT were obtained at fasting and at 30 and
120 min postload. The CIR and ISI were calculated from OGTT glucose and
insulin data as described below.
All participants from the different studies gave informed consent and the
local ethics committees approved the protocols.
Measurements. Weight, height and waist and hip circumferences were mea-
sured as previously reported18,19. In the MPP cohort at baseline, blood samples
were drawn at 0, 40 and 120 min of the 75 g OGTT for measurements of blood
glucose and serum insulin concentrations, and fasting samples were drawn at
the follow-up visit for measurement of plasma glucose and lipid concentrations
using standard techniques. In the Botnia study, blood samples were drawn at –
10, 0, 30, 60 and 120 min of the OGTT. Insulin sensitivity index (ISI) from the
OGTT was calculated as 10,000/O((fasting plasma glucose � fasting plasma
insulin)(mean OGTTglucose � mean OGTTinsulin))26. The basal insulin resis-
tance index (HOMA) was calculated from fasting insulin and glucose con-
centrations (see URLs section below). b-cell function was assessed as corrected
incremental insulin response during OGTT (CIR¼ (100� insulin at 30 min or
40 min in MPP))/((glucose at 30 min or 40 min in MPP)� (glucose 30 min or
40 min in MPP – 3.89))27 or as disposition index, that is, insulin secretion
adjusted for insulin sensitivity (CIR � ISI).
Plasma glucose was measured by hexokinase (MPP, FUSION), glucose
oxidase (Botnia, FUSION, METSIM) methods. Plasma insulin concentrations
were measured by an ELISA assay (Dako, Cambridgeshire; Botnia study), by a
local radioimmunoassay (MPP), by radioimmunoassay using dextran-charcoal
separation (FUSION) or by a commercial double-antibody solid-phase
radioimmunoassay (METSIM).
Genotyping. In the DGI and FUSION GWAS, genotyping was done using
Affymetrix 500K chip array1 and Illumina HumanHap300 BeadChip Version
1.0 (ref. 2). In the FUSION and METSIM studies, SNP rs10830963 was
genotyped by Sequenom iPlex gold SBE (Sequenom); in all other replication
studies rs10830963 was genotyped by an allelic discrimination assay-by-design
method on ABI 7900 (Applied Biosystems). Genotypes were in Hardy-
Weinberg equilibrium. In MPP and Botnia, we obtained an average genotyping
success rate of 495% and the concordance rate was 98.7%, using two different
methods (allelic discrimination on ABI7900 and Affymetrix). Replication
genotyping for FUSION and METSIM studies was done using Sequenom iPlex
gold SBE (Sequenom).
Immunocytochemistry. For histochemical analysis pancreatic specimens were
dissected, fixed overnight in Stefanini’s solution (2% paraformaldehyde and
0.2% picric acid in 0.1 M phosphate buffered saline, pH 7.2), rinsed thoroughly
in Tyrode solution containing 10% sucrose and frozen on dry ice. Sections
(10 mm thickness) were cut and thaw-mounted on slides. Antibodies were
diluted in PBS (pH 7.2) containing 0.25% BSA and 0.25% Triton X-100.
Sections were incubated with primary antibodies (goat antibody to melatonin
receptor 1B (code sc-13177; dilution 1:400, Santa Cruz Biotechnology)); goat
antibody to melatonin receptor 1A (code sc-13186, dilution 1:400, Santa Cruz
Biotechnology) and guinea pig antibody to proinsulin (code 9003; dilution
1:2,560; EuroDiagnostica) overnight at 4 1C in moisturizing chambers. The
sections were rinsed in PBS with Triton X-100 for 2 � 10 min. Thereafter
secondary antibodies with specificity for goat or guinea pig IgG, and coupled
to either fluorescein isothiocyanate (FITC) or Texas-Red (Jackson), were
applied on the sections. Incubation was for 1h at room temperature in
moisturizing chambers. The sections were again rinsed in PBS with Triton
X-100 for 2 � 10 min and then mounted in PBS:glycerol, 1:1. The specificity
of immunostaining was tested using primary antisera pre-absorbed with
homologous antigen (100 mg of peptide per ml antiserum at working
dilution). Immunofluorescence was examined in an epifluorescence micro-
scope (Olympus, BX60). By changing filters the location of the different
secondary antibodies in double staining was determined. Images were captured
with a digital camera (Nikon DS-2Mv)28.
Gene expression using real-time PCR. Total RNA was isolated with the
AllPrep DNA/RNA Mini Kit (Qiagen) at the Human Tissue Facility of Lund
University Diabetes Center (LUDC); or by RNeasy protect mini kit (Qiagen) as
previously described15 at the Joslin Islet Cell Resource Center (Joslin); or by
Trizol (Invitrogen) and further purification using RNeasy mini kit (Qiagen) at
the National Human Genome Research Institute (NHGR). RNA quantity was
determined by evaluating the absorbance at 260 and 280 nm in a Perkin-Elmer
spectrophotometer (Waltham), and quality was assessed by running samples on
Agilent 2100 Bioanalyzer (Agilent Technologies) at Joslin. cDNA was synthe-
sized from 0.4 mg total RNA using RevertAid First Strand cDNA Synthesis Kit
(Fermentas Life Sciences) (at LUDC); 0.5 mg total RNA using the High Capacity
RNA-to-cDNA Kit (Applied Biosystems) (at NHGR); and 1 mg total RNA using
iScript cDNA synthesis kit (Biorad) (at Joslin). TaqMan gene expression
assays were purchased from Applied Biosystems for the various target genes:
Hs00173794_m1 directed against human MTNR1B and HPRT (hypoxanthine-
guanine phosphoribosyl transferase) (at LUDC and NHGR) and PPIA (cyclo-
philin) (at Joslin), which served as endogenous control gene. Q-PCR reactions
were done on the ABI 7900HT (Applied Biosystems) at LUDC and NHGR by
mixing 2� TaqMan Universal Master Mix, 20� TaqMan Gene Expression
Assays, nuclease-free water and cDNA for a final reaction volume of 10 ml(at LUDC), as described earlier29 (at Joslin). The relative quantity of MTNR1B
103
mRNA was calculated using the comparative threshold method (Ct-method)
(at LUDC and NHGR). All experiments were performed in triplicate.
For microarray experiments, 100 ng total RNA was subjected to two rounds
of amplification (GeneChip Two-Cycle Kit, Affymetrix), and biotinylated RNA
was generated using GeneChip IVT Labeling Kit (Affymetrix). RNA products
were fragmented and hybridized to GeneChip Human HG U 133A Array
(Affymetrix). The array data were normalized and analyzed using DNA-Chip
Analyzer (dChip) software (see URLs section below, last accessed in January
2008) that assesses the standard errors for the expression indexes and calculates
confidence intervals for fold changes (Joslin, NHGR).
Effect of melatonin on insulin secretion. To determine the effects of
melatonin on insulin secretion, we incubated the clonal b cells from the line
832/13 with 0.1 mM melatonin for 1 h. Then, the amount of released insulin
into the buffer was determined by radioimmunoassay.
Statistical analyses. Differences in expression levels were tested by analysis of
variance or nonparametric Mann-Whitney tests. The odds ratios for risk of
developing T2D were calculated using logistic regression analyses adjusted for
age at participation and time to last follow-up, body mass index and sex.
Multivariate linear regression analyses were used to test genotype–phenotype
correlations adjusted for age, sex, body mass index (apart from body mass index
phenotype) and for within-family dependence. Non-normally distributed vari-
ables were log-transformed before analysis. Analysis of FUSION FSIGT and
METSIM OGTT data was carried out using a regression framework in which
regression coefficients were estimated in the context of a variance component
model to account for relatedness among individuals30. Trait values for both
studies were adjusted for age and age squared. For FUSION data sex was
included as an additional covariate. Analyses were carried out in nondiabetic
individuals excluding those known to be taking medications that directly affect
glucose or insulin concentrations. Covariate-adjusted trait values were trans-
formed to approximate univariate normality by applying an inverse normal
scores transformation; the scores were ranked, ranks were transformed into
quantiles and quantiles were converted to normal deviates.
All statistical analyses were performed using SPSS version 14.0, PLINK, Stata
(StataCorp) or MERLIN30.
URLs. Diabetes Trial Unit, http://www.dtu.ox.ac.uk/, dChip software, http://
biosun1.harvard.edu/complab/dchip/; PLINK, http://pngu.mgh.harvard.edu/
~purcell/plink/.
Note: Supplementary information is available on the Nature Genetics website.
ACKNOWLEDGMENTSThe DGI study was supported by a grant from Novartis.Studies in Malmoe were supported by grants from the Swedish Research Council,including a Linne grant (No. 31475113580), the Diabetes Programme at LundUniversity, the Pahlsson Foundation, the Heart and Lung Foundation, theWallenberg Foundation, the Swedish Diabetes Research Society, the CrafoordFoundation, Swedish Medical Society, Swedish Royal Physiographic Society, aNordic Centre of Excellence Grant in Disease Genetics, the Finnish DiabetesResearch Society, the Sigrid Juselius Foundation, Folkhalsan Research Foundation,Novo Nordisk Foundation, the European Network of Genomic and GeneticEpidemiology (ENGAGE), the Wallenberg Foundation, the European Foundationfor the Study of Diabetes (EFSD) and the Human Tissue facility at the LundUniversity Diabetes Center. Studies in human islets were supported in part by theItalian Ministry of University and Research (PRIN 2007-2008) and the EuropeanCommunity (LSHM-CT-2006-518153).Pancreatic islets at US National Institutes of Health were obtained through theICR Basic Science Islet Distribution Program (City of Hope Hospital, JoslinDiabetes Center, Northwestern University, Southern California Islet Consortium,University of Alabama Birmingham, University of Illinois, University of Miami,University of Minnesota, University of Pennsylvania, University of Wisconsinand Washington University), the Juvenile Diabetes Research Foundation IsletResources (Washington University) and the National Disease ResourceInterchange (NDRI).The FUSION study would like to thank the many research volunteers whogenerously participated in the various studies represented in FUSION. Wealso thank A.J. Swift, M. Morken, P.S. Chines and N. Narisu for genotyingand informatics support. Support for FUSION was provided by the following:NIH grant DK062370 (M. Boehnke), American Diabetes Association research
grant 1-05-RA-140 (R.M.W.), DK072193 (K.L. Mohlke) and National HumanGenome Research Institute intramural project number 1 Z01 HG000024(F.S. Collins). The METSIM study was supported by Academy of Finland grant124243 (M.L.).
AUTHOR CONTRIBUTIONSV.L.: DGI GWAS, data analysis and draft of the report. C.L.F.N., M.R.E.:in vitro expression experiments and analysis, and draft of the report. N.W.:immunocytochemistry. A.J.: genotyping and data analysis. P.S.: in vitro expressionexperiments. M. Bugliani: microarray and human islets experiments. R.S.: DGIGWAS analysis. M.F.: in vitro physiology. N.P.: genotyping. B.I., T.T.: phenotypingin the Botnia study. P.N.: phenotyping in the Malmoe study. J.K.: data analysis inMETSIM study. J.T.: phenotyping in the FUSION study. M. Boehnke: PI of theFUSION study. D.A.: PI of the DGI study. F.S.: immunocytochemistry. J.G.E.:phenotyping in the Helsinki Birth Cohort Study. A.U.J.: FUSION GWAS anddata analysis. M.L.: PI of the METSIM study. P.M.: microarray and human isletsexperiments. R.M.W.: FUSION GWAS analysis. H.M.: design and supervision ofin vitro study experiments and draft of the report. L.G. designed and supervisedall parts of the study and drafted the report. All researchers took part in therevision of the report and approved the final version.
Published online at http://www.nature.com/naturegenetics/
Reprints and permissions information is available online at http://npg.nature.com/
reprintsandpermissions/
1. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University &Novartis Institutes of BioMedical Research, et al. Genome-wide association analysisidentifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336(2007).
2. Scott, L.J. et al. A genome-wide association study of type 2 diabetes in Finns detectsmultiple susceptibility variants. Science 316, 1341–1345 (2007).
3. Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2diabetes. Nature 445, 881–885 (2007).
4. Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scalereplication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet.40, 638–645 (2008).
5. Zeggini, E. et al. Replication of genome-wide association signals in UK samples revealsrisk loci for type 2 diabetes. Science 316, 1336–1341 (2007).
6. Prokopenko, I. et al. Variants in MTNR1B influence fasting glucose levels and risk oftype 2 diabetes. Nat. Genet. advance online publication, doi:10.1038/ng.290(7 December 2008).
7. Peschke, E. Melatonin, endocrine pancreas and diabetes. J. Pineal Res. 44, 26–40(2008).
8. Kvetnoy, I.M. Extrapineal melatonin: location and role within diffuse neuroendocrinesystem. Histochem. J. 31, 1–12 (1999).
9. Boden, G., Ruiz, J., Urbain, J.L. & Chen, X. Evidence for a circadian rhythm of insulinsecretion. Am. J. Physiol. 271, E246–E252 (1996).
10. Pandi-Perumal, S.R. et al. Physiological effects of melatonin: role of melatoninreceptors and signal transduction pathways. Prog. Neurobiol. 85, 335–353(2008).
11. Muhlbauer, E. & Peschke, E. Evidence for the expression of both the MT1- and inaddition, the MT2-melatonin receptor, in the rat pancreas, islet and beta-cell. J. PinealRes. 42, 105–106 (2007).
12. Ramracheya, R.D. et al. Function and expression of melatonin receptors on humanpancreatic islets. J. Pineal Res. 44, 273–279 (2008).
13. Valle, T. et al. Mapping genes for NIDDM. Design of the Finland-United StatesInvestigation of NIDDM Genetics (FUSION) Study. Diabetes Care 21, 949–958(1998).
14. Eriksson, J.G., Osmond, C., Kajantie, E., Forsen, T.J. & Barker, D.J. Patterns of growthamong children who later develop type 2 diabetes or its risk factors. Diabetologia 49,2853–2858 (2006).
15. Marselli, L. et al. Gene expression of purified beta-cell tissue obtained from humanpancreas with laser capture microdissection. J. Clin. Endocrinol. Metab. 93,1046–1053 (2008).
16. Peschke, E., Bach, A.G. & Muhlbauer, E. Parallel signaling pathways of melatonin inthe pancreatic beta-cell. J. Pineal Res. 40, 184–191 (2006).
17. Peschke, E. et al. Melatonin and type 2 diabetes - a possible link? J. Pineal Res. 42,350–358 (2007).
18. Berglund, G. et al. Long-term outcome of the Malmo preventive project: mortality andcardiovascular morbidity. J. Intern. Med. 247, 19–29 (2000).
19. Lyssenko, V. et al. Clinical risk factors, DNA variants, and the development of type 2diabetes. N. Engl. J. Med. 359, 2220–2232 (2008).
20. Lyssenko, V. et al. Genetic prediction of future type 2 diabetes. PLoS Med. 2, e345(2005).
21. Lyssenko, V. et al. Predictors of and longitudinal changes in insulin sensitivity andsecretion preceding onset of type 2 diabetes. Diabetes 54, 166–174 (2005).
22. Steil, G.M., Volund, A., Kahn, S.E. & Bergman, R.N. Reduced sample number forcalculation of insulin sensitivity and glucose effectiveness from the minimal model.Suitability for use in population studies. Diabetes 42, 250–256 (1993).
104
23. Yang, Y.J., Youn, J.H. & Bergman, R.N. Modified protocols improve insulin sensitivityestimation using the minimal model. Am. J. Physiol. 253, E595–E602 (1987).
24. Bergman, R.N., Ider, Y.Z., Bowden, C.R. & Cobelli, C. Quantitative estimation of insulinsensitivity. Am. J. Physiol. 236, E667–E677 (1979).
25. Ward, W.K., Bolgiano, D.C., McKnight, B., Halter, J.B. & Porte, D. Jr. Diminished B cellsecretory capacity in patients with noninsulin-dependent diabetes mellitus. J. Clin.Invest. 74, 1318–1328 (1984).
26. Matsuda, M. & DeFronzo, R.A. Insulin sensitivity indices obtained from oral glucosetolerance testing: comparison with the euglycemic insulin clamp. Diabetes Care 22,1462–1470 (1999).
27. Hanson, R.L. et al. Evaluation of simple indices of insulin sensitivity and insulinsecretion for use in epidemiologic studies. Am. J. Epidemiol. 151, 190–198(2000).
28. Wierup, N., Bjorkqvist, M., Kuhar, M.J., Mulder, H. & Sundler, F. CART regulates islethormone secretion and is expressed in the beta-cells of type 2 diabetic rats. Diabetes55, 305–311 (2006).
29. Del Guerra, S. et al. Functional and molecular defects of pancreatic islets in humantype 2 diabetes. Diabetes 54, 727–735 (2005).
30. Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide associa-tion scans. Am. J. Hum. Genet. 81, 913–926 (2007).
105
Supplementary Information
A common variant in the melatonin receptor gene (MTNR1B) is associated with increased risk of future type 2 diabetes and impaired early insulin secretion
Valeriya Lyssenko, Cecilia L.F. Nagorny, Michael R. Erdos, Nils Wierup, Anna Jonsson, Peter Spégel, Marco Bugliani, Richa Saxena, Malin Fex, Nicolo Pulizzi, Bo Isomaa, Tiinamaija Tuomi, Peter Nilsson, Johanna Kuusisto, Jaakko Tuomilehto, Michael Boehnke, David Altshuler, Frank Sundler, Johan G. Eriksson, Anne U. Jackson, Markku Laakso, Piero Marchetti, Richard M. Watanabe, Hindrik Mulder and Leif Groop
106
Fig.
S1
Expr
essi
on o
f MTN
R1B
in h
uman
pan
crea
tic is
lets
. (A
) The
MTN
R1B
mR
NA
leve
ls
in h
uman
pan
crea
tic is
lets
was
50%
hig
her i
n T2
D (n
=4, b
lack
bar
) com
pare
d to
con
trols
in th
e m
icro
arra
y st
udie
s16
(n=4
, whi
te b
ar).
Bar
s re
pres
ent m
ean
±SE
. (B
) Cor
rela
tion
betw
een
the
MTN
R1B
mR
NA
leve
ls a
nd in
sulin
rele
ase
at 1
6.7
mM
gluc
ose15
.
AB
107
108
Chapter 6
Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci
Cell Metabolism2010;12(5):443-55
109
110
Global Epigenomic Analysis of Primary HumanPancreatic Islets Provides Insightsinto Type 2 Diabetes Susceptibility LociMichael L. Stitzel,1,6 Praveen Sethupathy,1,6 Daniel S. Pearson,1 Peter S. Chines,1 Lingyun Song,3 Michael R. Erdos,1
Ryan Welch,5 Stephen C.J. Parker,1 Alan P. Boyle,3 Laura J. Scott,5 NISC Comparative Sequencing Program,1,2
Elliott H. Margulies,1 Michael Boehnke,5 Terrence S. Furey,3 Gregory E. Crawford,3,4 and Francis S. Collins1,*1Genome Technology Branch2NIH Intramural Sequencing Center (NISC)
National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA3Institute for Genome Sciences & Policy4Department of Pediatrics, Division of Medical GeneticsDuke University, Durham, NC 27708, USA5Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA6These authors contributed equally to this work
*Correspondence: [email protected] 10.1016/j.cmet.2010.09.012
SUMMARY
Identifying cis-regulatory elements is important tounderstanding how human pancreatic islets modu-late gene expression in physiologic or pathophysio-logic (e.g., diabetic) conditions. We conductedgenome-wide analysis of DNase I hypersensitivesites, histone H3 lysine methylation modifications(K4me1, K4me3, K79me2), and CCCTC factor(CTCF) binding in human islets. This identified�18,000 putative promoters (several hundredunannotated and islet-active). Surprisingly, activepromotermodificationswere absent at genes encod-ing islet-specific hormones, suggesting a distinctregulatory mechanism. Of 34,039 distal (nonpro-moter) regulatory elements, 47% are islet uniqueand 22% are CTCF bound. In the 18 type 2 diabetes(T2D)-associated loci, we identified 118 putativeregulatory elements and confirmed enhancer activityfor 12 of 33 tested. Among six regulatory elementsharboring T2D-associated variants, two exhibitsignificant allele-specific differences in activity.These findings present a global snapshot of thehuman islet epigenome and should provide func-tional context for noncoding variants emerging fromgenetic studies of T2D and other islet disorders.
INTRODUCTION
Type 2 diabetes (T2D) is a complex metabolic disorder that
accounts for 85%–95% of all cases of diabetes and afflicts
hundreds of millions of people worldwide (http://www.
diabetesatlas.org/content/diabetes). It is a leading cause of
substantial morbidity and is characterized by defects in insulin
sensitivity and secretion resulting from the progressive dysfunc-
tion and loss of b cells in the pancreatic islets of Langerhans
(Butler et al., 2007; Muoio and Newgard, 2008). Both genetic
predisposition and environmental factors contribute to these islet
defects. Islets constitute 1%–2% of human pancreatic mass
(Joslin and Kahn, 2005) and are composed of five endocrine
cell types that secrete different hormones: a cells (glucagon),
b cells (insulin), d cells (somatostatin), PP cells (pancreatic poly-
peptide Y), and 3 cells (ghrelin). These cells sense changes in
blood glucose concentration and respond by modulating the
activity of multiple pathways, including insulin and glucagon
secretion, to maintain glucose homeostasis (Joslin and Kahn,
2005). Several key transcription factors (TFs) that regulate these
responses are known (Oliver-Krasinski and Stoffers, 2008).
However, efforts to identify cis-regulatory elements upon which
these and other factors act have been restricted primarily to
promoter regions at specific loci (e.g., INS, PDX1) (Brink, 2003;
Ohneda et al., 2000).
Results from genome-wide association studies (GWAS) of
type 1 diabetes (T1D) (Barrett et al., 2009), T2D (reviewed in
Prokopenko et al., 2008), and related metabolic traits (Dupuis
et al., 2010; Ingelsson et al., 2010; Prokopenko et al., 2009)
suggest that genetic variation in cis-regulatory elements may
play an important role in b cell (dys)function and diabetes
susceptibility (De Silva and Frayling, 2010). Of the 18 most
strongly associated single-nucleotide polymorphisms (SNPs) in
each of the T2D-associated loci, only 3 are missense variants;
the remaining are noncoding (Prokopenko et al., 2008). Further-
more, there is evidence for allele-specific effects of two T2D-
associated SNPs on the islet expression level of nearby genes
(TCF7L2 [Lyssenko et al., 2007] and MTNR1B [Lyssenko et al.,
2009]). However, the dearth of annotation of functional regula-
tory elements has limited the capacity to investigate the role of
regulatory variation in complex diseases such as T2D.
Recent characterization of histone modifications and DNase
hypersensitivity in cultured cells has identified chromatin signa-
tures predictive of regulatory elements and actively transcribed
regions (Boyle et al., 2008; Guenther et al., 2007; Heintzman
et al., 2007). The data generated so far suggest that regulatory
111
0
0.2
0.4
0.6
0.8
1E
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
GM12878 K562 HeLa-S3 HepG2 Union all 4
All DHS peaks (n = 101326)DHS peaks at RefSeq TSSs (n = 11829)DHS peaks not at RefSeq TSSs (n = 89497)
A B
C D
0
100
200
300
400
500
600
700
TSSAll Promoter
Intergenic
Exonic
Intronic
0
20
40
60
80
100
120
RefSeq TSSPromoterIntergenicExonicIntronic
DHS peaks (n = 101,326)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sequence-based phastCons elementsTopography-informed Chai elements
All TSSPromoter
Intergenic
Exonic
Intronic
All TSSPromoter
Intergenic
Exonic
Intronic
Frac
tion
of D
HS
peak
s
Aver
age
peak
leng
th (n
t)
Aver
age
peak
inte
nsity
(-lo
g[p
valu
e])
Frac
tion
of p
eaks
uni
que
to is
let
Frac
tion
over
lap
with
isle
t DH
S pe
aks
2 3 4 5 6 7 81
0.5
0.3
0.4
0.2
0.1
0.0
log10 [distance to nearest d-DHS peak (nt)]
**
**
F
Stringent islet-FAIRE peaks (n = 9887)
Den
sity
11%
12%
35%
33%
9%
Figure 1. Analysis of DNase I Hypersensitive Sites in the Islet Genome
(A) Distribution of DNase I-hypersensitive (DHS) peaks across five genomic annotation sets. ‘‘Promoter’’ denotes proximal regions 5 kb upstream of RefSeq tran-
scription start sites (TSSs) that do not overlap the TSS. ‘‘Exonic’’ represents regions that overlap at least one base with an exon.
112
element location and usage vary substantially among cell types
(Heintzman et al., 2009; Xi et al., 2007). Also, extensive chromatin
profiling has been conducted in very few human primary
tissues to date (Bhandare et al., 2010). In this study, we describe
a comprehensive genome-wide epigenomic map of unstimu-
lated human pancreatic islets. Using DNase- and ChIP-seq
approaches, we identified DNase I-hypersensitive sites that
mark regions of open chromatin, loci enriched for active histone
H3 lysine methylation modifications (H3K4me1, H3K4me3,
and H3K79me2), and binding sites for the insulator CCCTC-
binding factor (CTCF). These profiles provide a detailed
chromatin snapshot of regulatory elements and actively tran-
scribed units in the islet. Moreover, they identify regulatory
elements harboring T2D-associated variants in 6/18 loci. These
data provide a valuable resource for understanding and investi-
gating cis-regulation in the human islet and for discovering regu-
latory elements that may play an important role in diabetes
susceptibility.
RESULTS
Genome-wide Characterization of Open Chromatinin the Human Pancreatic IsletActive regulatory elements reside in open chromatin regions
hypersensitive to DNase I digestion (ENCODE Project Consor-
tium, 2007; Boyle et al., 2008; Crawford et al., 2004; Hesselberth
et al., 2009; Sabo et al., 2004). To identify all DNase-hypersensi-
tive sites (DHS) in the human pancreatic islet, we performed
DNase-seq (Boyle et al., 2008) and identified regions of the
genome with significant enrichment of sequence reads using
the MACS algorithm (Zhang et al., 2008) (Experimental Proce-
dures). This approach identified 101,326 human islet DHS peaks
(Table S1) covering �27 million bases (�1% of the human
genome). Consistent with observations in CD4+ T cells (Boyle
et al., 2008), a substantive fraction of islet DHS peaks (23%,
n = 23,408) span annotated RefSeq transcription start sites
(TSS) or are within regions 5 kb upstream (Promoter), but the
majority reside within currently unannotated genomic regions
that may harbor functional distal regulatory elements
(Figure 1A). Peaks at TSSs are significantly longer and more
intense than those at all other loci (Figure 1B). This observation
supports the view that regions around TSSs are generally more
susceptible to DNase I digestion than putative non-TSS regula-
tory elements (Boyle et al., 2008).
Approximately 48% (n = 48,777) of all DHS peaks overlap
phastCons vertebrate conserved elements (Siepel et al., 2005)
(Figure 1C). Notably, �87% (10,348/11,829) of peaks at TSSs
overlap phastCons elements, compared to �43% (38,429/
89,497) at non-TSS loci (Figure 1C). This difference remains
even after accounting for the longer peaks at TSSs (data not
shown), supporting the model that TSS-proximal regions evolve
under stronger sequence constraint than distal regulatory
elements (Boyle et al., 2008). A recent study developed an algo-
rithm (Chai) for topography-informed conservation analysis,
which identified�2-foldmore bases in the human genome under
evolutionary constraint compared to sequence-based methods
(Parker et al., 2009). Accordingly, �1.5 times as many (�76%)
islet DHS peaks overlap these structurally constrained regions
(Figure 1C).
To determine the extent of cell-type specificity of our islet DHS
peaks, we obtained DNase-seq data generated for four different
human cell lines: GM12878, K562, HeLa-S3, and HepG2 (Duke
DNase, ENCODE Project Consortium, 2007). We identified
DHS peaks for these cell lines (Experimental Procedures) and
found that roughly half the islet peaks are shared with each indi-
vidual nonislet cell type. Notably, �35% (n = 34,273) are
completely unique to the islet (Figure 1D). Almost all (�99%) of
these islet-unique peaks do not overlap RefSeq TSSs, which is
consistent with the model that tissue-specific gene expression
patterns are governed largely by distal cis-regulatory elements
(Heintzman et al., 2009).
An independent method to map open chromatin is formalde-
hyde-assisted isolation of regulatory elements (FAIRE) (Giresi
et al., 2007). Recently, this approach was used for human islets
to identify three sets of candidate peaks, including ‘‘stringent’’
(n = 9887) and ‘‘liberal’’ (n = 100,715) peaks (Gaulton et al.,
2010). Approximately 75% of the ‘‘stringent’’ islet FAIRE peaks
overlap DHS peaks. However, this corresponds to only 7360
peaks, which is far fewer than the predicted number of functional
regulatory elements genome-wide (ENCODE Project Consor-
tium, 2007). The overlap is significantly greater at TSSs com-
pared to non-TSSs (97% versus 65%) (Figure 1E). Comparing
DHS peaks to the set of ‘‘liberal’’ islet FAIRE peaks, the overlap
drops to �29%. Therefore, the two approaches seem to identify
distinct sets of non-TSS regulatory elements. Because it is diffi-
cult to assess the extent to which the dissimilarity between DHS
and FAIRE data is explained by differences in islet sample purity,
preparation methods, false positive signals, or population
(B) Average length (teal) and intensity (yellow) of DHS peaks across five genomic annotation sets. Peaks at RefSeq transcription start sites (TSSs) are significantly
longer and more intense than those elsewhere (**, two-tailed paired Student’s t test, p value < 10�100). Error bars represent SD (SD measurements were often
greater than the sample average due to highly skewed distributions, but error bars were cut off at zero for visualization).
(C) Sequence and structure constraint at DHS. DHS peaks at RefSeq TSSs are under substantially greater sequence constraint (assessed by phastCons verte-
brate conservation scores) than intronic and intergenic DHS peaks. A large majority of DHS peaks within all genomic annotation sets are under strong structural
constraint (assessed by the Chai algorithm) (Parker et al., 2009).
(D) Comparison of islet DHSpeakswith peaks from four different human cell lines. Each data point represents the fraction of total peaks (n = 101,326) unique to the
human islet relative to each of the other four human cell types or all of them combined (Union of all 4). Roughly 35% are unique to the islet, and 99% of these are
not located at RefSeq TSSs. Varying levels of similarity across cell types may be at least partially explained by differences in the stage of cellular differentiation
and/or sequencing depth.
(E) Overlap between DHS peaks and formaldehyde-assisted isolation of regulatory elements (FAIRE) peaks. The overlap is significantly greater at RefSeq TSSs
than elsewhere (**, Fisher’s exact test < 10�100).
(F) Logarithm-based distribution of the distance to the nearest distal DHS (d-DHS) peak among all d-DHS peaks. The blue box indicates an increased represen-
tation of peaks in the �100–1000 bp range (clustered) relative to Gaussian expectation (red curve). This range is significantly enriched for islet-unique peaks
(Fisher’s exact test, p = 2.7 3 10�9). Comparison of d-DHS, FAIRE, and GLITR locations is found in Figure S1.
113
diversity (McDaniell et al., 2010), more controlled comparisons of
these techniques will be necessary to elucidate inherent prefer-
ences of each for specific classes of open chromatin.
Though many of the mechanistic details are not clear, it is
widely accepted that distal and promoter regulatory elements
can exert coordinated control of gene transcription via physical
interactions (Dekker, 2003; Miele and Dekker, 2008). Therefore,
it has been hypothesized that distal cis-regulatory elements
may cluster together to form functional modules (Blanchette
et al., 2006). To assess the clustering of putative islet-active
distal cis-regulatory elements, we filtered from the islet DHS
peaks (n = 101,326) the regions that may represent promoters
to identify a set of high-confidence distal peaks (d-DHS, n =
34,039) (Table S2 and Figure S1 and Experimental Procedures).
For each d-DHS peak, we computed the distance to the nearest
d-DHS peak and observed an increased representation in the
�100–1000 bp range (n = 7652) relative to the expectation
from a normal distribution (Figure 1F). Furthermore, this set is
significantly enriched for islet-unique peaks (p = 2.7 3 10�9).
Genome-wide Characterization of TSSs in the IsletGenome via H3K4me3 ChIP-SeqTo characterize human islet TSSs, we conducted ChIP-seq anal-
ysis of histone 3 lysine 4 trimethylation (H3K4me3) in four
different human islet samples. H3K4me3 is enriched at CpG
islands (Bernstein et al., 2007), TSSs (Li et al., 2007), and sites
of active transcription (Kouzarides, 2007). Enriched regions
present in all four islet samples, but absent from three mock-IP
(anti-GFP) experiments, were designated as ‘‘H3K4me3 peaks.’’
This method identified 18,163 human islet H3K4me3 peaks
(Table S3) covering �1% of the genome.
As expected, approximately two-thirds (n = 11,973) of
H3K4me3 peaks overlap RefSeq TSSs (Figure 2A). Greater
than 70% of the remaining, unannotated peaks (n = 6190) over-
lap computationally predicted TSSs and/or CpG islands.
However, the significantly lower average length and intensity of
unannotated H3K4me3 peaks compared to those at RefSeq
TSSs (Figure 2B) suggests that at least some of these peaks
may indicate weakly active TSSs, inactive but poised TSSs
(Barski et al., 2007; Guenther et al., 2007; Mikkelsen et al.,
2007), remnants of transcriptional activity from the develop-
mental past or prior environmental stimulation (Barski et al.,
2009), or chromatin looping with distal regulatory regions. While
a subset of peaks could be false-positive signals, this is unlikely,
as it would require a technical artifact that is consistent across all
four islet samples.
Previous genome-wide profiling studies have reported a posi-
tive correlation between the intensity of H3K4me3 signal and
gene expression level (Barski et al., 2007; Guenther et al.,
2007). To test this observation in islets, we downloaded human
islet gene expression data from http://T1Dbase.org (Kutlu
et al., 2009), partitioned gene expression into quintiles, and
computed the average H3K4me3 signal length and intensity at
the TSSs of genes within each bin. Although the average
H3K4me3 peak length and intensity monotonically increases
with gene expression, there is great variability within each
expression bin (Figure 2C). Surprisingly, of the 245 most highly
islet-expressed genes in this data set, 18% (n = 45) have either
no or extremely low associated H3K4me3 signal. Notably, 71%
(32/45) also lacked a DHS peak (data not shown). Gene ontology
(GO) analysis revealed that these 45 genes are most significantly
enriched for themolecular function of hormone activity (p = 0.029
after Bonferroni correction for multiple testing) (Experimental
Procedures). These genes include insulin (INS), glucagon
(GCG), islet amyloid polypeptide (IAPP), pancreatic polypeptide
preprotein (PPY), somatostatin (SST), and transthyretin (TTR).
We confirmed by quantitative RT-PCR that INS, GCG, and SST
are robustly expressed (Figure S2), so it is unlikely that low
H3K4me3 at these TSSs is due to technical artifacts or adverse
effects of the islet shipment or handling process. Because these
genes are <10 kb in length, we considered the possibility that
weak H3K4me3 signal is simply associated with short genes.
However, the proportion of short genes (<10 kb in length) within
the set of ‘‘most highly expressed with no/low H3K4me3 signal’’
(66.7%, 30/45) is not statistically different from the proportion of
short genes within the entire set of most highly expressed
(69.8%, 171/245). This result suggests that the transcriptional
regulation of islet hormones and other related, highly islet-
expressed genes occurs through a distinct mechanism as
compared to most other genes.
H3K4me3 ChIP-chip (human embryonic stem cells, hepato-
cytes, REH cells [Guenther et al., 2007]) or ChIP-seq (human
CD4+ T cells [Barski et al., 2007] and GM12878, HUVEC,
NHEK, K562, and HeLa cell lines [Broad Institute ChIP-seq,
Bernstein lab, ENCODE Project Consortium, 2007]) data are
available for nine different human cell types. Comparisons
between islet and each other cell type indicated that, on average,
10%–30% of the islet peaks are unique (Figure 2D). Not surpris-
ingly, this value drops to�1.5% (n = 256) when comparedwith all
nine cell types together. Only 34 of the 256 islet-unique peaks
correspond to TSSs of annotated RefSeq genes, and these are
enriched for known pancreatic b cell functions such as secretion
(p = 9.33 10�3) and Ca2+-dependent exocytosis (p = 6.63 10�3)
(Table 1). Furthermore, several of the genes (SLC30A8, GCK)
harbor genetic variants that confer significant risk for T2D and
elevated plasma fasting glucose levels (Dupuis et al., 2010;
Ingelsson et al., 2010; Prokopenko et al., 2008, 2009). The
remaining 222 islet-unique peaks may represent alternative
TSSs of genes with function in developing and/or mature islets
or TSSs of unannotated coding or noncoding transcription units.
Identification of Unannotated Islet-Active TSSsH3K4me3 peaks in unannotated genomic space (n = 6190) are
TSS candidates. Because H3K4me3 may also be enriched at
inactive TSSs (Guenther et al., 2007), we adopted a two-step
approach to identify the subset of these 6190 peaks that are
likely to be active in the human islet (Figure S3A). First, we devel-
oped an algorithm that uses DHS peaks to assign directionality
to H3K4me3 peaks (Experimental Procedures). DHS peaks
tend to be sharply focused around the TSS, while H3K4me3
peaks are broader and extend well into the body of the transcrip-
tion unit. We hypothesized that the location of the DHS peak
relative to the H3K4me3 peak could predict the directionality
of the underlying gene. Using the strongest DHS peak within
an H3K4me3 peak, this simple algorithm performed at �90%
accuracy on annotated RefSeq genes known to be expressed
in the human islet (Experimental Procedures). Interestingly, the
majority (�80%) of the incorrectly assigned TSSs (based on
114
current annotation) harbored multiple DHS peaks, positioned on
either end of the H3K4me3 peak. These H3K4me3 peaks are
slightly (�200 nt) longer than those for which the orientation
was correctly assigned, increasing the likelihood of overlapping
non-TSS-related DHS peaks, which can confound the prediction
algorithm. Many of these non-TSS DHS peaks may correspond
to CTCF-binding sites that are located on the opposite side of
the DHS with respect to the TSS (Boyle et al., 2008) and RNA
polymerase (Pol) III-bound loci found in chromatin domains
occupied by Pol II and associated with enhancer-binding factors
(Oler et al., 2010). We observe examples of each case in our data
set (Figure S4).
Second, we performed ChIP-seq to profile genome-wide
histone 3 lysine 79 dimethylation (H3K79me2), which is enriched
in actively transcribed regions (Guenther et al., 2007). If the rela-
tive density of H3K79me2 reads on either side of an H3K4me3
peak was consistent with its predicted directionality (as deter-
mined from the pattern of the DHS and H3K4me3 signal), then
Computationally predicted TSSsand/or CpG islands
Potential Novel TSSs
26%
74%
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0
1000
2000
3000
4000
0-20 20-40 40-60 60-80 80-1000
100
200
300
350
0
1000
2000
3000
3500
All TSSPromoter
Intergenic
Exonic
Intronic
0
100
200
300
RefSeq TSSPromoterIntergenicExonicIntronic
Aver
age
peak
leng
th (n
t)
Aver
age
peak
inte
nsity
(-log
[p v
alue
])
Aver
age
peak
leng
th (n
t)
Aver
age
peak
inte
nsity
(-log
[p v
alue
])
Percentile bins of expressionin the human pancreatic islet
**
GM12878
Hepatocytes
ESCCD4+ T
REHK562
HUVEC
NHEKHeLa
Union all 9
256Frac
tion
of p
eaks
uni
que
to th
e is
let
H3K4me3 peaks (n = 18,163)A
B
C
D
15%
6%
66%
7%
6%
Figure 2. Analysis of Histone 3 Lysine 4 Trimethylation Loci in the Islet Genome
(A) Distribution of H3K4me3 peaks across five genomic annotation sets as described in Figure 1A. Two-thirds of the peaks span RefSeq transcription start sites
(TSSs, left pie chart). Non-RefSeq H3K4me3 peaks are enriched for computationally predicted TSS and/or CpG islands (right pie chart). Additional information is
provided in Figure S3.
(B) Average length (purple) and intensity (blue) of H3K4me3 peaks across five genomic annotation sets as described in Figure 1B. The average length and intensity
of peaks is significantly higher at TSSs (**, two-tailed paired Student’s t test, p value < 10�100). Error bars represent SD.
(C) Relationship between average H3K4me3 peak length (yellow)/intensity (purple) and average gene expression level. Error bars represent SD.
(D) Comparison of islet H3K4me3 peaks with peaks from nine different human cell types. Each data point represents the fraction of total peaks (n = 18,163) unique
to the human islet relative to each of the other nine human cell types or all of them combined (Union of all 9). �1.5% of the peaks are unique to the islet. Varying
levels of similarity across cell types may be at least partially explained by differences in the stage of cellular differentiation and/or sequencing depth.
115
the underlying TSSwas classified as islet active. Intragenic TSSs
are difficult to assess using this method, because the H3K79me2
signal may be due to transcription from an upstream TSS.
Restricting the analysis to intergenic space, we identified 263
candidates for unannotated, islet-active TSSs (Table S4), of
which 75% (n = 196) overlapCpG islands and/or computationally
predicted TSSs (Figure S3A). These candidates include islet-
active TSSs for noncoding RNAs such as the let-7a-1 cluster
of microRNAs (Figure 3A) and the miR-1179/miR-7-2 cluster
(Figure S3B). We also identified putative alternative TSSs for
genes with important islet function such as pancreatic peptidyl-
glycine a-amidating mono-oxygenase (PAM), which encodes for
an islet secretory granule membrane protein (Figure 3B). Finally,
we identified an active promoter locus that is contained within
a recently reported T1D-associated region on chromosome 12
(index SNP rs1701704). This promoter could underlie an unanno-
tated transcript or could be an alternative promoter for the down-
stream gene Ikaros family zinc finger 4 (IKZF4) (Figure S3C),
which is considered a strong functional candidate for T1D (Hako-
narson et al., 2008).
Identification of Distal cis-Regulatory ElementsSites bound by the CTCF are an important class of cis-regulatory
elements that can mediate insulator or other regulatory activities
(Phillips and Corces, 2009). To generate a genome-wide CTCF-
binding site profile in the human islet, we performed ChIP-seq
and designated enriched regions as ‘‘CTCF peaks’’ (n =
21,304) (Table S5 and Experimental Procedures). We assessed
the genomic distribution of peaks (Figure 4A), computed the
average peak intensity/length across various genomic cate-
gories (Figure 4B), and identified the most significantly overrep-
resented motif within the peaks using MEME (Figure 4C and
Supplemental Experimental Procedures). The results corrobo-
rate those from previously described studies in other cell types
(Kim et al., 2007; Jothi et al., 2008; Cuddapah et al., 2009).
Further, only 0.6% (n = 123) of CTCF peaks were islet unique
(Figure 4D). Finally, we observed that among the 77% of CTCF
peaks that overlap 22% of DHS peaks, the CTCF peaks are
positioned near the center of the DHS peak with a slight 50 shift(Figure 4E).
Previous studies have observed depletion of monomethylated
histone 3 lysine 4 (H3K4me1) at TSSs and enrichment at putative
enhancers such as distal STAT1 and EP300 sites (ENCODE
Project Consortium, 2007; Heintzman et al., 2007, 2009; Robert-
son et al., 2008) and nonpromoter DHS (Barski et al., 2007;
Robertson et al., 2008; Wang et al., 2008). To profile H3K4me1
across the human islet genome, we repeated the ChIP-seq
strategy described above for three islet samples. We computed
the average ratio of the density of extended H3K4me1 sequence
reads in DHS peaks at RefSeq TSSs (t-DHS, n = 11,829) and
d-DHS peaks (n = 34,039) (Experimental Procedures) to the
density in flanking control regions that do not harbor DHS signal
(Experimental Procedures). t-DHS peaks are significantly
depleted for H3K4me1, whereas d-DHS peaks are significantly
enriched (Figure 5). Further, there was no significant difference
in H3K4me1 enrichment between CTCF-positive and CTCF-
negative d-DHS. Although we detected depletion of H3K4me1
at t-FAIRE peaks, there was no enrichment at d-FAIRE peaks
(Figure 5).
We did not detect dramatically different H3K4me1 enrichment
levels between intergenic and intragenic d-DHS peaks (Fig-
ure S5). Interestingly, although the average H3K4me3 read
density in d-DHS peaks was �3-fold less than that of
H3K4me1, d-DHS peaks were still enriched for H3K4me3 signal
relative to flanking control regions (Figure S5). These observa-
tions are consistent with the previous finding that although
H3K4me1 often marks distal regulatory regions, a substantial
portion is also associated with H3K4me3 signal (Robertson
et al., 2008). Overall, the enrichment of active histone modifica-
tions suggests that islet d-DHS peaks are strong candidates for
putative regulatory elements. Fifty published index SNPs (http://
www.genome.gov/gwastudies/) and their linkage disequilibrium
partners (r2 > 0.6) for diabetes (T1D, T2D) and related quantita-
tive traits (fasting glucose, fasting insulin) are found within
500 bp of nonpromoter d-DHS peaks (Table S9 and Experi-
mental Procedures), suggesting that these SNPs may contribute
to diabetes or altered islet physiology by modulating regulatory
element activity.
Application of Chromatin Profiles to T2DSusceptibility LociTo identify regulatory elements and transcripts that may underlie
molecular mechanisms of T2D, we analyzed the chromatin
profiles in the 18 GWAS-derived genomic loci conferring risk
for T2D (Prokopenko et al., 2008). The genomic boundaries
of each association signal (Table S6) were defined by the Spotter
algorithm (Experimental Procedures). The chromatin profiles do
not predict any alternative promoters or unannotated/noncoding
Table 1. Examples of Islet-Unique H3K4me3 Peaks
Gene Symbol Relevance to Islet Biology
GCK Involved in glucose metabolism
T2D GWAS locus (Dupuis et al., 2010)
Harbors an islet-specific promoter (Magnuson, 1990)
SLC30A8 Involved in cation (Zn+) transport important for insulin
secretion (Chimienti et al., 2004)
T2D GWAS locus (Prokopenko et al., 2008)
Exhibits islet-specific expression (Chimienti et al., 2004)
REG1A Derived from regenerating islets (Terazono et al., 1988)
FFAR1 Exhibits islet-specific expression (Bartoov-Shifman
et al., 2007)
Regulates insulin secretion (Itoh et al., 2003)
SYT4 Involved in Ca2+-dependent trafficking and exocytosis
of secretory vesicles (Tsuboi and Rutter, 2003)
KCNK16 Exhibits pancreas-specific expression
(Girard et al., 2001)
ELAVL4 Regulates cell proliferation (Joseph et al., 1998)
UCN3 Regulates glucose-stimulated insulin secretion
(Li et al., 2007)
PRSS1 Harbors mutations that underlie hereditary pancreatitis
and pancreatic cancer (Teich et al., 1998)
Nine examples among the 34 islet-unique peaks that are at RefSeq tran-
scription start sites (TSSs). The corresponding genes have known
pancreatic islet function (such as insulin secretion), and some harbor
genetic variants that confer significant risk for type 2 diabetes
(SLC30A8 and GCK).
116
transcripts in these regions. However, they do identify 118
d-DHS peaks, which represent putative distal regulatory ele-
ments (Table S7 and Experimental Procedures). About one-
quarter of these elements (n = 28) are bound by CTCF in the islet.
Six of the 118 elements contain one or more T2D-associated
SNPs (index SNP or SNP with r2 > 0.6) (Table S8). These six
include a previously identified element containing the index
SNP rs7903146 in the TCF7L2 locus (Gaulton et al., 2010). The
remaining five map to the IGF2BP2, KCNQ1, WFS1, FTO, and
CDC123/CAMK1D loci. Only the CDC123/CAMK1D element is
bound by CTCF in the islet.
Validation of Putative Islet Regulatory Elementsin T2D LociTo determine whether predicted regulatory elements in the islet
can function as enhancers, we cloned two classes of elements
Scalechr9:
20 kb95970000 95975000 95980000 95985000 95990000 95995000 96000000 96005000 96010000
hsa-let-7a-1hsa-let-7f-1
hsa-let-7d
DHS
140 -
2 _
H3K4me1
18 -
3 _
H3K4me3
137 -
3 _
H3K79me2
DHS
H3K4me1
H3K4me3
H3K79me2
29 -
3 _
Scalechr5:
Eponine TSSSwitchGear TSS
Mammal Cons
RhesusMouse
DogHorse
ArmadilloOpossumPlatypus
LizardChicken
X_tropicalisStickleback
100 kb102150000 102200000 102250000 102300000 102350000
PAMPAMPAMPAMPAMPAM
84 -
2 _19 -
3 _135 -
3 _23 -
3 _Duke Uniq 20Duke Uniq 24Duke Uniq 35
Umass Uniq 15
3
3
Eponine TSSSwitchGear TSS
Mammal ConsRhesusMouse
DogHorse
ArmadilloOpossumPlatypus
LizardChicken
X_tropicalisStickleback
Duke Uniq 20Duke Uniq 24Duke Uniq 35
Umass Uniq 15let-7 miRNA clusterBG326593
BI459078BG724094
Pri-let-7promoter
Un-annotatedislet-activepromoter
Un-annotatedislet-uniquepromoter
Annotatedislet-activepromoter
B
A
Figure 3. Identifying Unannotated Islet-Active Transcription Start Sites
(A) Candidate islet-active TSS for the primary transcript of the ubiquitous let-7a-1/7d/7f-1microRNA cluster. The TSS (red box; DHS+, H3K4me3+, H3K4me1�) is
�10 kb upstream of the 50-most microRNA in the cluster, and the full-length primary transcript (H3K79me2+) of�35 kbmatches a known EST (BSG326593). This
EST likely represents a noncoding RNA primary transcript from which the let-7 cluster of miRNAs is processed (Marson et al., 2008). The strategy for predicting
TSSs is shown in Figure S3A.
(B) Two candidate islet-active alternative TSSs (red boxes) for the gene PAM, which encodes an islet secretory granule membrane protein. One of the candidate
TSSs is also islet unique and occurs between the annotated TSS and an unannotated islet-active TSS. Examples of confounding factors for predicting islet-active
TSSs are shown in Figure S4.
117
containing d-DHS peaks into luciferase reporter vectors
(Figure 6): those bound by CTCF (‘‘C,’’ n = 11) and those that
are not (‘‘P,’’ n = 33). We also cloned a number of non-DHS,
non-CTCF controls (‘‘N,’’ n = 15). Because human islet cell lines
are not available, we tested these elements for enhancer activity
in murine pancreatic MIN6 (Figure 6A) and HeLa (Figure 6B) cell
lines. Only �15% (4/26) of the negative controls exhibited
enhancer activity in any orientation or cell type (�9% [1/11] of
0
200
400
600
800
1000
All TSSPromoter
Intergenic
Exonic
Intronic
0
20
40
60
80
90
0
.05
.1
.15
.2
.25
.3
.35
-1.0
to -0
.9-0
.9 to
-0.8
-0.8
to -0
.7-0
.7 to
-0.6
-0.6
to -0
.5-0
.5 to
-0.4
-0.4
to -0
.3-0
.3 to
-0.2
-0.2
to -0
.1-0
.1 to
00
to 0
.10.
1 to
0.2
0.2
to 0
.30.
3 to
0.4
0.4
to 0
.50.
5 to
0.6
0.6
to 0
.70.
7 to
0.8
0.8
to 0
.9
Aver
age
peak
leng
th (n
t)
Aver
age
peak
inte
nsity
(-lo
g[ [p
val
ue])
Frac
tion
of p
eaks
uni
que
to is
let
Frac
tion
of C
TCF
peak
s
Position relative to DHS peak (kb)
6%9%
46%
30%
9%
123
GM12878
K562HUVEC
NHEKUnion all 5
CD4+ T
CTCF peaks (n = 21,304)
0
0.05
0.1
0.15
0.2
0.25
10
1
2
2 3 4 5 6 7 8 9 10 11 12 13 14
bits
RefSeq TSSPromoterIntergenicExonicIntronic
A B
C
E
D
Figure 4. Profiling of Binding Sites for the CCCTC-Binding Factor
(A) Distribution of CTCF peaks across five genomic annotation sets as described in Figure 1A.
(B) Average length (orange) and intensity (green) of CTCF peaks across five genomic annotation sets is fairly uniform. Error bars represent SD.
(C) Motif determined by MEME (Bailey and Elkan, 1994) using the top 10% of CTCF peaks.
(D) Comparison of islet CTCF peaks with peaks from five different cell types. Each data point represents the fraction of total peaks (n = 21,304) unique to the
human islet relative to each of the other five human cell types or all of them combined (Union of all 5). Less than 1% of the peaks are unique to the islet (n =
123). Varying levels of similarity across cell typesmay be at least partially explained by differences in the stage of cellular differentiation and/or sequencing depth.
(E) Positioning of CTCF peaks relative to the center of overlapping DHS peaks (red line). Almost all CTCF peaks that overlap DHS peaks are within 200 bp of the
DHS peak center.
118
‘‘C’’ elements and 20% [3/15] of ‘‘N’’ elements) (Figures 6A and
6B). In contrast, �2.5-fold more ’’P’’ elements demonstrated
enhancer activity (12/33). This positive rate (36.4%) is compa-
rable to that of predicted HeLa enhancers (Heintzman et al.,
2009) that exhibited increased luciferase activity in our HeLa
reporter assays (38.5%, 5/13).
Four of 12 ‘‘P’’ elements exhibiting enhancer activity (P4,
KCNJ11/ABCC8; P12, TCF7L2; P17, WFS1; P20, HHEX/IDE)
are unique to the islet; one of these (P17, WFS1) is also unde-
tected by at least three other methods for the prediction of
regulatory element potential: PReMod (Ferretti et al., 2007),
phastCons (Siepel et al., 2005), and islet-FAIRE (Gaulton
et al., 2010). The average H3K4me1 enrichment among the 12
d-DHS peaks in the elements exhibiting enhancer activity was
similar to that computed for all d-DHS (�1.3-fold) (Figure 6C).
However, there was large variation in H3K4me1 enrichment
among individual elements (0.6- to 3.4-fold), with only 3/12
enriched above baseline (1.0) (Figure 6C).
Allele-Specific Analysis of Five Regulatory ElementsContaining T2D-Associated SNPsFive ‘‘P’’ elements tested contain T2D-associated SNPs (P9,
IGF2BP2; P12, TCF7L2; P17, WFS1; P21, KCNQ1; P23, FTO)
(Figures 6A and 6B). Notably, four out of the five elements (all
except P9) exhibited enhancer activity in at least one orientation
and cell type tested. To assess allele- or haplotype-specific
effect(s) of T2D-associated variants on enhancer activity, we
cloned these four regions from the genomic DNA of individuals
with risk and nonrisk genotypes/haplotypes and compared lucif-
erase reporter activity (Figures 6D and S6A). We confirmed
significantly stronger enhancer activity for the TCF7L2 element
(P12) containing the rs7903146 risk allele relative to the nonrisk
allele (�3-fold) (Figure 6D) (Gaulton et al., 2010). TCF7L2 allelic
enhancer effects were specific to the MIN6 cell line (Figure 6D,
compare MIN6 and HeLa). Sequencing of the TCF7L2 inserts
from each haplotype revealed two variant bases, a novel variant
(C/G at Chr10:114,747,977; hg18) and rs7903146; only
rs7903146 mediated allele-specific effects on enhancer activity
(Figure 6D, compare Risk to Nonrisk and Nonrisk(m)) (Fig-
ure S6B). We also identified a haplotypic effect on enhancer
activity for the WFS1 element (P17), which contains four SNPs
(rs4689397, rs6823148, rs881796, and rs4234731). The risk
haplotype exhibited �30% lower activity than nonrisk in HeLa
cells (Figure 6D).
DISCUSSION
In this study, we describe themost comprehensive characteriza-
tion to date of the epigenomic profile of unstimulated human
pancreatic islets. Using DNase- and ChIP-seq techniques, we
profiled open chromatin, CTCF-binding sites, H3K4me3,
H3K4me1, and H3K79me2 across the entire genome in human
islets. Integrated analysis of these large-scale data sets identi-
fied �18,000 putative TSSs, �30% of which were previously
unannotated by RefSeq. Further computational genomic anal-
yses revealed that at least several hundred of these are
islet-active TSSs, including those for major islet miRNAs previ-
ously implicated in the control of glucose homeostasis (Lynn,
2009). Interestingly, active chromatin marks (H3K4me3, DHS,
H3K79me2) were absent from a subset of highly islet-expressed
genes, including those encoding islet-specific hormones (INS,
GCG, SST, IAPP, PPY, and TTR). This observation suggests
that some genes critical for islet function have an unconventional
promoter chromatin signature, indicative of a unique transcrip-
tional control mechanism. Mutskov and Felsenfeld (2009) have
proposed such a model based on detailed analysis of the INS
locus in human islets.
We also identified �34,000 candidate distal regulatory
elements in human islets. A substantial number of these putative
elements were clustered (<1000 bp from each other). Compari-
sons with other cell types indicated that these clustered
elements are significantly enriched for islet-unique sites and
thus may represent islet-specific regulatory modules worthy of
more extensive future investigation. Based on CTCF-binding
profiles, �22% of the �34,000 candidate distal regulatory
elements are predicted insulator sites. Previous studies have
reported that the H3K4me1 signal is enriched in distal regulatory
elements (Heintzman et al., 2007, 2009). Though our analyses
confirm this finding in aggregate, we show that H3K4me1 enrich-
ment may not be a reliable predictor of regulatory activity for
individual elements.
Fifty SNPs associated with islet-related diseases and traits
map to within 500 bp of a candidate nonpromoter regulatory
element. Focusing on T2D, 4 of the 12 elements that function
as enhancers in vitro (FTO, KCNQ1, TCF7L2, and WFS1 loci)
harbor T2D-associated SNPs, including two (TCF7L2 and
WFS1 loci) that exhibit significant allele-specific differences in
activity. These results suggest that altered enhancer activity
plays a role in the molecular mechanism underlying at least
a subset of T2D genetic association signals.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
t-FAIRE t-DHS d-FAIRE d-DHS non-CTCFd-DHS
** **
**
n.s.
H3K
4me1
fold
enr
ichm
ent
Figure 5. Representation Analysis of Histone H3 Lysine 4 Monome-
thylation in Candidate Regulatory Regions
DNase I-hypersensitive site (DHS) and formaldehyde-assisted isolation of
regulatory elements (FAIRE) peaks at RefSeq TSSs (t-DHS and t-FAIRE,
respectively) are significantly depleted for H3K4me1 signal (**, two-tailed
paired Student’s t test, p < 0.005), and DHS peaks at distal candidate regula-
tory elements (d-DHS) are enriched for H3K4me1 signal (*, two-tailed paired
Student’s t test, p < 0.01). Error bars represent SD among three islet samples.
FAIRE data were obtained from Gaulton et al. (2010). Representation analysis
of additional histone modifications is shown in Figure S5.
119
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
A
B
0 5 10 15 20
Forward
ReverseNon-risk (m)
Non-risk
Risk
0 1 2 3 4 5
Forward
Reverse
Non-risk
Risk
Relative Luciferase activity (a.u.)
Relative luciferase activity (a.u.)
0 0.2 0.4 0.6 0.8 1
Forward
Reverse
Non-risk (m)
Non-risk
Risk
0 1 2 3 4 5
Forward
Reverse
Non-risk
Risk
Relative Luciferase activity (a.u.)
Relative luciferase activity (a.u.)
MIN6 HeLa
TC
F7L
2 (P
12)
WF
S1 (P
17)
**
**
30%**
1 5 10 11 1298743 62 1 5 10 11 1298743 62 1 5 10 11 #12
# 98743 62 2018161513 14 #17 19 22#21
#231513 14 24 25 26 27 28 29 30 31 32 33
dDHS+/CTCF+ ("C") dDHS- ("N") dDHS+/CTCF- ("P")
1 5 10 11 1298743 62 1 5 10 11 1298743 62 1 5 10 11 #12
# 98743 62 2018161513 14 #17 19 22#21
#231513 14 24 25 26 27 28 29 30 31 32 33
dDHS+/CTCF+ ("C") dDHS- ("N") dDHS+/CTCF- ("P")
9.2/9.5 10.6/21.8 9.6 12.3
29.9 21.8 11.7 10.6/18.5
6
2
4
3
1
5
0
H3K
4me1
fold
enr
ichm
ent
Elements exhibiting enhancer activity
C D
Average
Baseline
Rel
ativ
e lu
cife
rase
act
ivity
(a.u
.)R
elat
ive
luci
fera
se a
ctiv
ity (a
.u.)
**
** ****
HeLa forward
HeLa reverse
P32P21
P31P23
P4 P12P17
P15P20
P26P8 P27
MIN6 forward
MIN6 reverse
120
These data sets should provide functional context for noncod-
ing variants identified through additional association, targeted
resequencing, or whole-genome sequencing studies. Further
analysis of the repertoire of regulatory elements in the human
islet will enhance the understanding of gene regulation in the islet
and should offer additional insight into the molecular mecha-
nisms that underlie diabetes susceptibility.
EXPERIMENTAL PROCEDURES
Human Islets
Fresh human pancreatic islets were obtained from the ICR Basic Science Islet
Distribution Program and National Disease Research Interchange (NDRI). Islet
viability and purity were assessed by the distribution centers and are shown
along with phenotypic/clinical information of each donor in Table S10. Islets
were warmed to 37�C andwashedwith calcium- andmagnesium-free Dulbec-
co’s phosphate-buffered saline (Invitrogen; Carlsbad, CA) prior to crosslinking.
For chromatin immunoprecipitation (ChIP) studies, cells were crosslinked for
20 min in 1% formaldehyde at room temperature, frozen in liquid nitrogen,
and stored at �80�C.
DNase-Seq and DHS Peak Identification
For DNase-seq experiments, fresh pancreatic islets were disaggregated to
achieve single-cell suspension. Islets were washed with prewarmed 1X PBS
once and resuspended with dissociation solution (1 ml of 1X PBS, 50 ml of
0.05 U/ml Dispase I stock solution [Roche; Indianapolis, IN]). Islet suspension
was transferred to a 6-well culture dish, incubated at 37�C for 30 min, dissoci-
ated with a 2 ml sterile pipette, and incubated for another 30 min. This incuba-
tion-agitation cycle was repeated 4 or 5 times until >90% of islets were disag-
gregated into single cells. Cells were washed with prewarmed 1X PBS once
and prepared for DNase-seq experiments as previously described (Song
and Crawford, 2010). Libraries from three primary human islet samples
(Table S10) were sequenced using the Illumina GAII platform. Peaks were
identified using MACS (Supplemental Experimental Procedures) (Zhang
et al., 2008).
ChIP and Illumina GAII Sequencing
ChIP assays were carried out as previously described (Scacheri et al.,
2006), with the following modifications. Intact nuclei were isolated and
chromatin was sheared on ice using a Branson 450 Sonifier (constant duty
cycle, output 4, 12–16 cycles of 20 s sonicationwith 1min rest between cycles)
to a size of 200–1000 bp. Antibodies used for ChIP were anti-H3K4me3
(ab8580, Abcam; Cambridge, MA), anti-H3K4me1 (ab8895, Abcam), anti-
H3K79me2 (ab3594, Abcam), anti-CTCF (ab70303, Abcam; 07-729, Millipore;
Danvers, MA), and anti-GFP (sc-8334, Santa Cruz Biotechnology; Santa
Cruz, CA).
Islet ChIP-seq libraries were prepared and sequenced using the Illumina
GAII protocol and platform. The number of sequencing lanes, clusters, aligned
reads, repeat-filtered reads (no satellite reads), and unique starts is shown
for each islet and ChIP experiment in Table S12. MACS (Zhang et al., 2008)
was used to call H3K4me3 and CTCF peaks (Supplemental Experimental
Procedures).
Genome-wide Analysis of Chromatin Marks
Perl and R scripts were written to perform the genomic characterization and
comparative analysis of DHS, H3K4me3, and CTCF peaks. Unless otherwise
noted, functional annotation data sets (including RefSeq and UCSC known
genes, predicted TSSs and bidirectional promoters, phastCons elements,
CpG islands, and ChIP-seq data sets) were downloaded from the UCSC Table
Browser on November 1, 2009 (http://genome.ucsc.edu/cgi-bin/hgTables).
For ‘‘computationally predicted TSSs,’’ both the Eponine and the Switch-
gear data sets from the UCSC Table Browser were utilized. Human pancreatic
islet gene expression data were downloaded from T1DBase (http://T1Dbase.
org), and expression data for other tissues were downloaded from BioGPS
Human U133A/GNF1H Gene Atlas (http://biogps.gnf.org/downloads/). Islet-
selective gene expression was defined as at least 3-fold greater expression
in the islet relative to any other tissue represented. Genome-wide results of
the Chai algorithm were determined according to the parameters in Parker
et al. (2009), and islet-FAIRE data sets were obtained from Gaulton et al.
(2010). GO analyses were performed using the web-based tool NIH DAVID
6.7 (http://david.abcc.ncifcrf.gov/). For the DHS peak clustering analysis
(Figure 1F) and the histone modification enrichment/depletion analysis
(Figures 5 and S5), we stringently defined d-DHS peaks as those that are
not within H3K4me3 peaks andR5 kb away from RefSeq TSSs, UCSC Known
Gene TSSs, Eponine or Switchgear computationally predicted TSSs, and CpG
islands, yielding 34,039 d-DHS. To select regulatory elements to test for
enhancer activity (Figure 6), the definition of d-DHS was slightly loosened
(R5 kb upstream and R1 kb downstream from known and predicted TSSs
and CpG islands). P values for statistical comparisons were computed using
either the two-tailed paired Student’s t test or the Fisher’s exact test. Details
of the remaining computational analyses are described in Supplemental
Experimental Procedures.
Molecular Cloning
Putative regulatory elements were amplified from human genomic DNA with
primers designed using PrimerTile (http://research.nhgri.nih.gov/tools/).
Element boundaries were determined by manual H3K4me1 profile inspection.
Coordinates of amplified elements and primer sequences for amplification are
found in Table S13. Putative regulatory elements were cloned using the
Gateway system (Invitrogen). Generation of Gateway-compatible vectors is
described in Supplemental Experimental Procedures. Variants of interest
were introduced using QuikChange Lightning (Stratagene; La Jolla, CA). Muta-
genesis primer sequences are available upon request. Mutagenesis was
confirmed by direct sequencing.
Transfection and Dual Luciferase Assays
Cells were seeded in 96-well plates (40,000 cells/well HeLa, 60,000 cells/well
MIN6) and cotransfected with 0.072 pmol Gateway-modified firefly (pGL 4.23,
Promega; Madison, WI) and 2 ng Renilla (pRL-TK, Promega) vectors using
Lipofectamine 2000 (Invitrogen). Two vector preparations per insert orientation
were tested. Transfections were performed in triplicate.
Cells were lysed in 13 passive lysis buffer (Promega) 36–48 hr posttransfec-
tion, and dual luciferase assays were run on a Centro/Centro XS3 Microplate
Luminometer LB 960 (Berthold; Bad Wildbad, Germany). Firefly values were
normalized to Renilla to control for differences in cell number or transfection
efficiency. Luciferase assays were performed in triplicate. For each element
tested, at least two independent vector preparations were used. Activity was
Figure 6. Luciferase Reporter Activity Validates Putative Enhancer Elements
(A) Relative luciferase activity of constructs in three element classes tested in MIN6 cells. Genomic locations of elements are found in Table S13. Blue and orange
dashed lines indicate 2.33 standard deviations (p = 0.01) (Heintzman et al., 2009) above the median activity of tested CTCF-bound regions for elements cloned in
the forward or reverse orientations, respectively. Data represent the mean ±SD of three replicates each for two separate clones (six total measurements). C,
d-DHS+/CTCF+ element; N, d-DHS�/CTCF�; P, d-DHS+/CTCF� element. # marks elements containing T2D-associated SNPs. Numbers above the bars indi-
cate the luciferase activity for elements beyond the scale of the y axis; a.u. denotes arbitrary units.
(B) Relative luciferase activity of constructs in three element classes tested in HeLa cells. Data are analyzed and annotated as in (A).
(C) H3K4me1 representation in the 12 elements exhibiting enhancer activity. Though the overall average enrichment of H3K4me1 is �1.3-fold (green line), only
3/12 elements are above baseline (red line). Error bars represent SD among three islet samples.
(D) Relative luciferase activity of TCF7L2 (P12) andWFS1 (P17) elements in MIN6 (left panels) or HeLa (right panels) cells containing the risk or nonrisk alleles of
T2D-associated SNPs. For TCF7L2, (m) denotes a mutation generated by site-directed mutagenesis from the risk to nonrisk allele. Data represent the mean ±SD
of three replicates each from at least two independent clones. **, two-tailed unpaired Student’s t test, p < 0.01. Additional allelic analysis is shown in Figure S6.
121
defined as 2.33 standard deviations (SD) (p = 0.01) above themedian activity of
negative controls (Heintzman et al., 2009), defined as CTCF-bound elements in
this study.
ACCESSION NUMBERS
The NCBI Gene Expression Omnibus (GEO) umbrella accession number,
which links to the individual ChIP-seq and DNase-seq data sets, is GSE23784.
SUPPLEMENTAL INFORMATION
Supplemental Information includes Supplemental Experimental Procedures,
six figures, and 13 tables and can be found with this article online at doi:
10.1016/j.cmet.2010.09.012.
ACKNOWLEDGMENTS
Human pancreatic islets used in this study were obtained through the ICR
Basic Science Islet Distribution Program (University of Minnesota, University
of Alabama-Birmingham, University of Illinois, University of Miami, North-
western University) and the National Disease Research Interchange (NDRI).
We thank Fangfei Ye and Lisa Bukovnik at the Duke IGSP Sequencing Core
Facility for sequencing DNase libraries, the DIAGRAM Consortium for helpful
discussion regarding variants in the KCNQ1 locus, andmembers of the Collins
and Boehnke labs for insightful discussions during the study and critical
comments on the manuscript. Special thanks to Cristen Willer and Greg Keele
for help with statistical analyses of ChIP/GWAS data. This study was sup-
ported by the NIH Division of Intramural Research/NHGRI project number
Z01-HG000024 (F.S.C.), by NIH grant DK062370 (M.B.), and by an NIH/NHGRI
ENCODE Consortium grant (U54HG004563 to G.E.C. and T.S.F.).
Received: May 7, 2010
Revised: July 22, 2010
Accepted: August 26, 2010
Published: November 2, 2010
REFERENCES
Bailey, T.L., and Elkan, C. (1994). Fitting a mixture model by expectation maxi-
mization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol.
Biol. 2, 28–36.
Barrett, J.C., Clayton, D.G., Concannon, P., Akolkar, B., Cooper, J.D., Erlich,
H.A., Julier, C., Morahan, G., Nerup, J., Nierras, C., et al. (2009). Genome-
wide association study and meta-analysis find that over 40 loci affect risk of
type 1 diabetes. Nat. Genet. 41, 703–707.
Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G.,
Chepelev, I., and Zhao, K. (2007). High-resolution profiling of histone methyl-
ations in the human genome. Cell 129, 823–837.
Barski, A., Jothi, R., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., and
Zhao, K. (2009). Chromatin poises miRNA- and protein-coding genes for
expression. Genome Res. 19, 1742–1751.
Bartoov-Shifman, R., Ridner, G., Bahar, K., Rubins, N., and Walker, M.D.
(2007). Regulation of the gene encoding GPR40, a fatty acid receptor ex-
pressed selectively in pancreatic beta cells. J. Biol. Chem. 282, 23561–23571.
Bernstein, B.E., Meissner, A., and Lander, E.S. (2007). The mammalian epige-
nome. Cell 128, 669–681.
Bhandare, R., Schug, J., Le Lay, J., Fox, A., Smirnova, O., Liu, C., Naji, A., and
Kaestner, K.H. (2010). Genome-wide analysis of histone modifications in
human pancreatic islets. Genome Res. 20, 428–433.
Blanchette, M., Bataille, A.R., Chen, X., Poitras, C., Laganiere, J., Lefebvre, C.,
Deblois, G., Giguere, V., Ferretti, V., Bergeron, D., et al. (2006). Genome-wide
computational prediction of transcriptional regulatory modules reveals new
insights into human gene expression. Genome Res. 16, 656–668.
Boyle, A.P., Davis, S., Shulha, H.P., Meltzer, P., Margulies, E.H., Weng, Z.,
Furey, T.S., and Crawford, G.E. (2008). High-resolution mapping and charac-
terization of open chromatin across the genome. Cell 132, 311–322.
Brink, C. (2003). Promoter elements in endocrine pancreas development and
hormone regulation. Cell. Mol. Life Sci. 60, 1033–1048.
Butler, P.C., Meier, J.J., Butler, A.E., and Bhushan, A. (2007). The replication of
beta cells in normal physiology, in disease and for therapy. Nat. Clin. Pract.
Endocrinol. Metab. 3, 758–768.
Chimienti, F., Devergnas, S., Favier, A., and Seve, M. (2004). Identification and
cloning of a beta-cell-specific zinc transporter, ZnT-8, localized into insulin
secretory granules. Diabetes 53, 2330–2337.
Crawford, G.E., Holt, I.E., Mullikin, J.C., Tai, D., Blakesley, R., Bouffard, G.,
Young, A., Masiello, C., Green, E.D., Wolfsberg, T.G., et al. (2004). Identifying
gene regulatory elements by genome-wide recovery of DNase hypersensitive
sites. Proc. Natl. Acad. Sci. USA 101, 992–997.
Cuddapah, S., Jothi, R., Schones, D.E., Roh, T.Y., Cui, K., and Zhao, K. (2009).
Global analysis of the insulator binding protein CTCF in chromatin barrier
regions reveals demarcation of active and repressive domains. Genome
Res. 19, 24–32.
De Silva, N.M., and Frayling, T.M. (2010). Novel biological insights emerging
from genetic studies of type 2 diabetes and related metabolic traits. Curr.
Opin. Lipidol. 21, 44–50.
Dekker, J. (2003). A closer look at long-range chromosomal interactions.
Trends Biochem. Sci. 28, 277–280.
Dupuis, J., Langenberg, C., Prokopenko, I., Saxena, R., Soranzo, N., Jackson,
A.U., Wheeler, E., Glazer, N.L., Bouatia-Naji, N., Gloyn, A.L., et al. (2010). New
genetic loci implicated in fasting glucose homeostasis and their impact on type
2 diabetes risk. Nat. Genet. 42, 105–116.
ENCODE Project Consortium. (2007). Identification and analysis of functional
elements in 1% of the human genome by the ENCODE pilot project. Nature
447, 799–816.
Ferretti, V., Poitras, C., Bergeron, D., Coulombe, B., Robert, F., and
Blanchette, M. (2007). PReMod: a database of genome-wide mammalian
cis-regulatory module predictions. Nucleic Acids Res. 35 (Database issue),
D122–D126.
Gaulton, K.J., Nammo, T., Pasquali, L., Simon, J.M., Giresi, P.G., Fogarty,
M.P., Panhuis, T.M., Mieczkowski, P., Secchi, A., Bosco, D., et al. (2010).
A map of open chromatin in human pancreatic islets. Nat. Genet. 42, 255–259.
Girard, C., Duprat, F., Terrenoire, C., Tinel, N., Fosset, M., Romey, G., Lazdun-
ski, M., and Lesage, F. (2001). Genomic and functional characteristics of novel
human pancreatic 2P domain K(+) channels. Biochem. Biophys. Res.
Commun. 282, 249–256.
Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. (2007). FAIRE
(Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active
regulatory elements from human chromatin. Genome Res. 17, 877–885.
Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A.
(2007). A chromatin landmark and transcription initiation at most promoters
in human cells. Cell 130, 77–88.
Hakonarson, H., Qu, H.Q., Bradfield, J.P., Marchand, L., Kim, C.E., Glessner,
J.T., Grabs, R., Casalunovo, T., Taback, S.P., Frackelton, E.C., et al. (2008).
A novel susceptibility locus for type 1 diabetes on Chr12q13 identified by
a genome-wide association study. Diabetes 57, 1143–1146.
Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D.,
Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A., et al. (2007). Distinct and
predictive chromatin signatures of transcriptional promoters and enhancers
in the human genome. Nat. Genet. 39, 311–318.
Heintzman, N.D., Hon, G.C., Hawkins, R.D., Kheradpour, P., Stark, A., Harp,
L.F., Ye, Z., Lee, L.K., Stuart, R.K., Ching, C.W., et al. (2009). Histone modifi-
cations at human enhancers reflect global cell-type-specific gene expression.
Nature 459, 108–112.
Hesselberth, J.R., Chen, X., Zhang, Z., Sabo, P.J., Sandstrom, R., Reynolds,
A.P., Thurman, R.E., Neph, S., Kuehn, M.S., Noble, W.S., et al. (2009). Global
mapping of protein-DNA interactions in vivo by digital genomic footprinting.
Nat. Methods 6, 283–289.
Ingelsson, E., Langenberg, C., Hivert, M.F., Prokopenko, I., Lyssenko, V.,
Dupuis, J., Magi, R., Sharp, S., Jackson, A.U., Assimes, T.L., et al. (2010).
Detailed physiologic characterization reveals diverse mechanisms for novel
122
genetic Loci regulating glucose and insulin metabolism in humans. Diabetes
59, 1266–1275.
Itoh, Y., Kawamata, Y., Harada, M., Kobayashi, M., Fujii, R., Fukusumi, S., Ogi,
K., Hosoya, M., Tanaka, Y., Uejima, H., et al. (2003). Free fatty acids regulate
insulin secretion from pancreatic beta cells through GPR40. Nature 422,
173–176.
Joseph, B., Orlian, M., and Furneaux, H. (1998). p21(waf1) mRNA contains
a conserved element in its 3’-untranslated region that is bound by the Elav-
like mRNA-stabilizing proteins. J. Biol. Chem. 273, 20511–20516.
Joslin, E.P., and Kahn, C.R. (2005). Joslin’s diabetes mellitus, Fourteenth
Edition (Philadelphia, Pa.: Lippincott Williams & Willkins).
Jothi, R., Cuddapah, S., Barski, A., Cui, K., and Zhao, K. (2008). Genome-wide
identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic
Acids Res. 36, 5221–5231.
Kim, T.H., Abdullaev, Z.K., Smith, A.D., Ching, K.A., Loukinov, D.I., Green,
R.D., Zhang, M.Q., Lobanenkov, V.V., and Ren, B. (2007). Analysis of the verte-
brate insulator protein CTCF-binding sites in the human genome. Cell 128,
1231–1245.
Kouzarides, T. (2007). Chromatin modifications and their function. Cell 128,
693–705.
Kutlu, B., Burdick, D., Baxter, D., Rasschaert, J., Flamez, D., Eizirik, D.L.,
Welsh, N., Goodman, N., and Hood, L. (2009). Detailed transcriptome atlas
of the pancreatic beta cell. BMC Med. Genomics 2, 3.
Li, C., Chen, P., Vaughan, J., Lee, K.F., and Vale, W. (2007). Urocortin 3 regu-
lates glucose-stimulated insulin secretion and energy homeostasis. Proc. Natl.
Acad. Sci. USA 104, 4206–4211.
Lynn, F.C. (2009). Meta-regulation: microRNA regulation of glucose and lipid
metabolism. Trends Endocrinol. Metab. 20, 452–459.
Lyssenko, V., Lupi, R., Marchetti, P., Del Guerra, S., Orho-Melander, M., Almg-
ren, P., Sjogren, M., Ling, C., Eriksson, K.F., Lethagen, A.L., et al. (2007).
Mechanisms by which common variants in the TCF7L2 gene increase risk of
type 2 diabetes. J. Clin. Invest. 117, 2155–2163.
Lyssenko, V., Nagorny, C.L., Erdos, M.R., Wierup, N., Jonsson, A., Spegel, P.,
Bugliani, M., Saxena, R., Fex, M., Pulizzi, N., et al. (2009). Common variant in
MTNR1B associated with increased risk of type 2 diabetes and impaired early
insulin secretion. Nat. Genet. 41, 82–88.
Magnuson, M.A. (1990). Glucokinase gene structure. Functional implications
of molecular genetic studies. Diabetes 39, 523–527.
Marson, A., Levine, S.S., Cole, M.F., Frampton, G.M., Brambrink, T., John-
stone, S., Guenther, M.G., Johnston, W.K., Wernig, M., Newman, J., et al.
(2008). Connecting microRNA genes to the core transcriptional regulatory
circuitry of embryonic stem cells. Cell 134, 521–533.
McDaniell, R., Lee, B.K., Song, L., Liu, Z., Boyle, A.P., Erdos, M.R., Scott, L.J.,
Morken, M.A., Kucera, K.S., Battenhouse, A., et al. (2010). Heritable individual-
specific and allele-specific chromatin signatures in humans. Science 328,
235–239.
Miele, A., and Dekker, J. (2008). Long-range chromosomal interactions and
gene regulation. Mol. Biosyst. 4, 1046–1057.
Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G.,
Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide
maps of chromatin state in pluripotent and lineage-committed cells. Nature
448, 553–560.
Muoio, D.M., and Newgard, C.B. (2008). Mechanisms of disease: molecular
and metabolic mechanisms of insulin resistance and beta-cell failure in type
2 diabetes. Nat. Rev. Mol. Cell Biol. 9, 193–205.
Mutskov, V., and Felsenfeld, G. (2009). The human insulin gene is part of a large
open chromatin domain specific for human islets. Proc. Natl. Acad. Sci. USA
106, 17419–17424.
Ohneda, K., Ee, H., and German, M. (2000). Regulation of insulin gene tran-
scription. Semin. Cell Dev. Biol. 11, 227–233.
Oler, A.J., Alla, R.K., Roberts, D.N., Wong, A., Hollenhorst, P.C., Chandler,
K.J., Cassiday, P.A., Nelson, C.A., Hagedorn, C.H., Graves, B.J., and Cairns,
B.R. (2010). Human RNA polymerase III transcriptomes and relationships to
Pol II promoter chromatin and enhancer-binding factors. Nat. Struct. Mol.
Biol. 17, 620–628.
Oliver-Krasinski, J.M., and Stoffers, D.A. (2008). On the origin of the beta cell.
Genes Dev. 22, 1998–2021.
Parker, S.C.J., Hansen, L., Abaan, H.O., Tullius, T.D., and Margulies, E.H.
(2009). Local DNA topography correlates with functional noncoding regions
of the human genome. Science 324, 389–392.
Phillips, J.E., and Corces, V.G. (2009). CTCF: master weaver of the genome.
Cell 137, 1194–1211.
Prokopenko, I., McCarthy, M.I., and Lindgren, C.M. (2008). Type 2 diabetes:
new genes, new understanding. Trends Genet. 24, 613–621.
Prokopenko, I., Langenberg, C., Florez, J.C., Saxena, R., Soranzo, N.,
Thorleifsson, G., Loos, R.J., Manning, A.K., Jackson, A.U., Aulchenko, Y.,
et al. (2009). Variants in MTNR1B influence fasting glucose levels. Nat. Genet.
41, 77–81.
Robertson, A.G., Bilenky, M., Tam, A., Zhao, Y., Zeng, T., Thiessen, N.,
Cezard, T., Fejes, A.P., Wederell, E.D., Cullum, R., et al. (2008). Genome-
wide relationship between histone H3 lysine 4 mono- and tri-methylation
and transcription factor binding. Genome Res. 18, 1906–1917.
Sabo, P.J., Hawrylycz, M., Wallace, J.C., Humbert, R., Yu, M., Shafer, A.,
Kawamoto, J., Hall, R., Mack, J., Dorschner, M.O., et al. (2004). Discovery of
functional noncoding elements by digital analysis of chromatin structure.
Proc. Natl. Acad. Sci. USA 101, 16837–16842.
Scacheri, P.C., Crawford, G.E., and Davis, S. (2006). Statistics for ChIP-chip
and DNase hypersensitivity experiments on NimbleGen arrays. Methods
Enzymol. 411, 270–282.
Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom,
K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolution-
arily conserved elements in vertebrate, insect, worm, and yeast genomes.
Genome Res. 15, 1034–1050.
Song, L., and Crawford, G.E. (2010). DNase-seq: a high-resolution technique
for mapping active gene regulatory elements across the genome from
mammalian cells. Cold Spring Harb. Protoc. 2010. 10.1101/pdb.prot5384.
Teich, N., Mossner, J., and Keim, V. (1998). Mutations of the cationic trypsin-
ogen in hereditary pancreatitis. Hum. Mutat. 12, 39–43.
Terazono, K., Yamamoto, H., Takasawa, S., Shiga, K., Yonemura, Y., Tochino,
Y., and Okamoto, H. (1988). A novel gene activated in regenerating islets.
J. Biol. Chem. 263, 2111–2114.
Tsuboi, T., and Rutter, G.A. (2003). Insulin secretion by ‘kiss-and-run’ exocy-
tosis in clonal pancreatic islet beta-cells. Biochem. Soc. Trans. 31, 833–836.
Wang, Z., Zang, C., Rosenfeld, J.A., Schones, D.E., Barski, A., Cuddapah, S.,
Cui, K., Roh, T.Y., Peng, W., Zhang, M.Q., and Zhao, K. (2008). Combinatorial
patterns of histone acetylations and methylations in the human genome. Nat.
Genet. 40, 897–903.
Xi, H., Shulha, H.P., Lin, J.M., Vales, T.R., Fu, Y., Bodine, D.M., McKay, R.D.,
Chenoweth, J.G., Tesar, P.J., Furey, T.S., et al. (2007). Identification and
characterization of cell type-specific and ubiquitous chromatin regulatory
structures in the human genome. PLoS Genet. 3, e136.
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E.,
Nusbaum, C., Myers, R.M., Brown, M., Li, W., and Liu, X.S. (2008). Model-
based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.
123
Supplemental Information Cell Metabolism, Volume 12
Global Epigenomic Analysis of Primary Human Pancreatic Islets Provides Insights into Type 2 Diabetes Susceptibility Loci Michael L. Stitzel, Praveen Sethupathy, Daniel S. Pearson, Peter S. Chines, Lingyun Song, Michael R. Erdos, Ryan Welch, Stephen C.J. Parker, Alan P. Boyle, Laura J. Scott, NISC Comparative Sequencing Program, Elliott H. Margulies, Michael Boehnke, Terrence S. Furey, Gregory E. Crawford, and Francis S. Collins
Supplemental Experimental Procedures
DNase Mapping and Peak Calling
Twenty base pair reads were mapped to the reference genome, resulting in ~15, ~21 and
~37 million mappable, (no more than two mismatches) unique (only once in the genome), non-
satellite-repeat reads for each of the samples (Table S11).
MACS (Zhang et al., 2008) version 1.3.7.1 was used to identify genomic regions of
enrichment for mapped DNase-seq reads in the following manner. (1) Command line options:
MACS was run with the following options: --nomodel --shiftsize=1 --bw=70, (2) Handling
duplicate reads: because DNase-seq is expected to produce sequence reads that begin at precisely
the same base (“duplicates”), we modified MACS to count up to 6 “duplicate” reads per locus to
eliminate PCR artifacts, (3) Minimizing false positives: since no input control was generated for
DNase-Seq, MACS estimated the background noise ( local) using the DNase-Seq data from the
islet samples themselves. Depending on the genomic context, DHS can occur either in isolation
or in clusters. To account for this, MACS was run with two separate sets of parameter values for
the local noise correction, tuned for isolated (( local= max( 5000, 10000)) and clustered ( local=
max( 30000, 50000)) DHS. The union of the results from the two separate runs was designated as
the final set of MACS calls. Calls present in at least two of the three islet samples were defined
as “DHS peaks.”
124
H3K4me3 and CTCF Peak Identification
Genomic regions enriched for mapped sequence reads were determined using MACS.
Input was sequenced and used as a control. For CTCF, all default parameters were used, and for
H3K4me3, MACS was run with a modified set of parameter values ( local= max( 1000, 30000,
50000)) to account for its broader signal. The number of peaks called is found in Table S12. For
H3K4me3/CTCF peak identification in non-islet cell types, we processed the publicly available
raw data in a similar manner, with the caveat that in instances where input control was not
provided, MACS was run without a control.
Identification of Unannotated Islet-Active TSSs
DHS peaks tend to be punctate around the TSS, where as H3K4me3 peaks usually extend
well into the body of the transcribed unit. Therefore, we predicted the directionality of
unannotated TSSs using the following approach: First, the strongest DHS peak within each
H3K4me3 peak was determined according to the peak intensity value provided by MACS.
Second, the sequence length covered by the H3K4me3 peak on either side of the DHS peak was
computed. Third, the side with more sequence coverage was used to assign orientation to the
underlying TSS (i.e., more sequence coverage on the left side denotes minus strand transcription;
right side denotes plus strand transcription). When tested on TSSs of known, islet-active genes
with unidirectional TSSs, the algorithm performed at ~90% accuracy. Unannotated islet-active
TSSs were identified by applying this algorithm to unannotated H3K4me3 peaks and selecting
only those for which the predicted orientation is the same as the directionality of the H3K79me2
signal at that locus (Figure S2).
125
CTCF Motif Identification
MACS-identified CTCF peaks from the assay using the AbCam antibody were
intersected with those from the assay using the Millipore antibody. The top 10% (n=2130) of
intersected peaks were determined according to the sum of the -log10(p-values) provided by
MACS for each of the peaks from each assay. MEME (Bailey and Elkan, 1994) version 4.3.0
was used to identify motifs in these “top” CTCF peaks. MEME was run with '-mod zoops -dna -
revcomp' options, and the highest scoring motif was reported.
Histone Modification Enrichment/Depletion Analysis
For each of three islet samples, we computed the ratio of the density of extended
H3K4me1, H3K4me3 and H3K79me2 sequence reads in t-DHS, d-DHS, t-FAIRE and d-FAIRE
peaks to the density in flanking control regions that do not overlap any DHS/FAIRE signal. The
two flanking control regions selected for each DHS/FAIRE peak were 550-300 nucleotides
upstream of the 5’ DHS/FAIRE peak boundary (left flank) and 300-550 nucleotides downstream
of the 3’ DHS/FAIRE peak boundary (right flank). The density ratios were averaged across all
three islet samples for the overall average enrichment/depletion level. P-values of
enrichment/depletion were computed using the two-tailed paired Student’s t-test.
Identification of T2D Association Signal Boundaries (Spotter)
For each of the 18 T2D susceptibility loci, we identified the boundaries of the association
signal using a sliding window algorithm that takes into consideration linkage disequilibrium
(LD), recombination rates, and association p-values. Starting at the position of the most strongly
associated "index" SNP, we examined windows of 75kb in the 5’ and 3’ directions, scanning for
126
SNPs in LD (r2 0.5) with the index SNP or having an association p-value 10-5. After
identifying the most distant 5’ and 3’ window meeting either of these criteria, we selected the
nearest recombination hotspots ( 10 cM/MB) as the 5’ and 3’ boundaries. We examined the
intervals selected for each locus using the LocusZoom software, which generates plots showing
chromosomal position, association p-values, linkage disequilibrium patterns, and recombination
rate information. Each locus was visually inspected to ensure that the entire association signal
was contained within the selected interval. Our algorithm, implemented in the software package
Spotter, and the plotting tool LocusZoom are available online
(http://csg.sph.umich.edu/boehnke/spotter/ and http://csg.sph.umich.edu/locuszoom/).
Generating Gateway-Compatible Luciferase Vectors
To generate Gateway-compatible luciferase reporter vectors, Gateway cassette B (Invitrogen)
was ligated into an EcoRV site in the multiple cloning site of the pGL4.23 luciferase reporter
vector (Promega) in forward and reverse orientations. Gateway cassette orientation was
confirmed by restriction digest. The integrity of each Gateway-cloned insert was confirmed with
restriction enzyme digestion and/or direct sequencing.
RNA Preparation and TaqMan Expression Analysis in Human Islets
Two thousand islet equivalents (approximately 2 million cells) were harvested in Trizol
(Invitrogen), and total RNA was isolated using the RNeasy mini kit (Qiagen). 250 ng of RNA
from each sample was reverse transcribed using the high capacity RNA-to-cDNA kit (Applied
Biosystems). 12.5 ng of cDNA was used per TaqMan gene expression assay (Applied
Biosystems) per sample, and each assay was performed in triplicate. Expression was measured
127
using inventoried TaqMan gene expression assays for INS (Hs00355773_m1), GCG
(Hs00174967_m1), and SST (Hs00356144_m1). Relative transcript abundance was calculated
using the delta Ct method (Applied Biosystems) with a TaqMan gene expression assay for
GAPDH (Hs99999905_m1) serving as the normalization control. Serial 4-fold dilutions of total
pancreas cDNA ranging from 200 ng to 0.78125 ng were used to generate a standard curve and
assess TaqMan gene expression assay amplification efficiency; all assays were >99% efficient.
Cell Culture
HeLa cells were cultured in DMEM containing 10% FBS. MIN6 cells were cultured in DMEM
containing 10% FBS, 100 mM sodium pyruvate, and 100 M 2-mercaptoethanol. Cells were
maintained at 37 C and 5% CO2.
128
129
Figure S1. Comparison of Predicted Distal Regulatory Elements in the Human Islet among
Three Different Experimental Procedures
16,785 (7,929+4,713+4,143) out of 34,039 distal DHS peaks (d-DHS) overlap with GLITR
(Bhandare et al., 2010) and/or FAIRE (Gaulton et al., 2010) peaks.
130
-20
-15
-10
-5
0
5
10
A (50%)B (60%) Islet 1
C (60%) Islet 2
D (70%) E (75%) F (80%)Islet 3
G (80%) H (90%) I (90%)Islet 6
J (90%) TotalPancreas
HeLa Fibroblast
INSSSTGCG
Figure S2 (related to Figure 2)
131
Figure S2. Insulin (INS), Glucagon (GCG), and Somatostatin (SST) Genes Are Highly
Expressed in Human Pancreatic Islets from Cadaveric Donors
TaqMan gene expression assays were used to measure abundance of INS (Hs00355773_m1),
GCG (Hs00174967_m1), and SST (Hs00356144_m1) in 10 human islet samples (A-J).
Numbered islets indicate islets analyzed from Table S12. Islet purity is in parentheses.
Expression was determined by the delta Ct method using GAPDH (Hs99999905_m1) for
normalization. Values are represented on the log(2) scale. Total human pancreas is shown for
comparison, and HeLa and fibroblasts were used as negative controls.
132
6190
5599
5292
4060
2174
506
263
196
(109
8, 4
1)
(119
2, 5
0)
(1
179,
48)
(151
6,90
)
(170
6, 1
12)
(191
8, 1
38)
A B
C
Rem
ove
othe
r ann
otat
ed T
SSs
such
as
from
the
UC
SC K
now
n G
enes
trac
k
Rem
ove
“noi
se s
ubpe
aks”
Ove
rlap
with
DH
S pe
aks
foun
d in
a
t lea
st 2
out
of 3
repl
icat
es
Intr
agen
ic
Inte
rgen
ic
Ove
rlap
H3K
79m
e2 p
eaks
“Str
and
pred
icto
r” a
lgor
ithm
as
sign
s m
atch
ing
orie
ntat
ion
for
H3K
4me3
and
H3K
79m
e2 p
eaks
Ove
rlap
CpG
isla
nds
or
com
puta
tiona
lly p
redi
cted
TSS
s
1886
(1
207,
51)
Figu
re S
3 (r
elat
ed to
Fig
s 2
and
3)Pe
aks
not o
verla
ppin
g R
ef S
eq T
SSs
133
Figure S3. Identification of Unannotated, Intergenic, Islet-Active Transcription Start Sites
(A) Algorithm schematic. Red numbers in parentheses (x, y) next to each category indicate
average length (x) and intensity (y) of H3K4me3 peaks. The increase in average length and
intensity of H3K4me3 peaks as the algorithm proceeds provides increased confidence in the
strength of the putative TSS. “Noise subpeaks” refer to H3K4me3 peak calls that are
immediately adjacent to, overlap the same DHS, and thus likely represent the same signal as, a
larger H3K4me3 peak at a RefSeq TSS. The “strand predictor” algorithm independently predicts
the directionality of an H3K4me3 peak using DHS and H3K79me2 and assesses whether the
predictions match.
(B) Candidate islet-active transcription start site for the primary transcript of the islet-expressed
miR-1179/miR-7-2 microRNA cluster. The putative transcription start site [TSS, red box]
(DHS-enriched, H3K4me3-enriched, H3K4me1-depleted) is ~3.5 kb upstream of the 5’-most
microRNA (hsa-miR-1179) in the cluster, and the full-length primary transcript (H3K79me2-
enriched) is approximately ~7.5kb. Hepatocyte nuclear factor 1 (HNF1) has a predicted
conserved binding site within 5kb upstream of the putative TSS and regulatory factor X1 (RFX1)
has a predicted conserved binding site immediately adjacent to the TSS. The TSS is predicted to
be bidirectional according to the NHGRI BiPro dataset (http://genome.ucsc.edu/hgTables),
which is supported by substantial H3K79me2 signal on both sides of the TSS.
(C) Unannotated active promoter region harboring a type 1 diabetes associated variant. Variant
rs10876864 is in strong linkage disequilibrium (r2 > 0.6) with a published type 1 diabetes (T1D)
index SNP (rs1701704) (Hakonarson et al., 2008) and falls within a region that is a putative
unannotated islet-active transcription start site [TSS, red box] (DHS+, H3K4me3+). This may
represent a TSS of an unannotated transcript, or, an alternative TSS of the downstream gene
IKZF4, a strong candidate gene for T1D (Hakonarson et al., 2008). The annotated promoter of
IKZF4 lacks both a DHS and strong H3K4me3 peak (black box). Both the annotated (black box)
and the candidate (red box) TSS are highly sequence-conserved according to the “Mammal
Cons” track.
134
Figure S4 (related to Fig 3)A
B
135
Figure S4. Examples of Incorrectly Predicted Directionality of H3K4me3 Peaks
(A) Example of an incorrect prediction at the HMGN4 transcription start site due to a DNase I
hypersensitive site (DHS) that corresponds to a RNA polymerase III bound locus (blue box).
(B) Example of an incorrect prediction at the GAB2 transcription start site due to a DHS that
corresponds to a CTCF bound locus (red box).
136
Figu
re S
5 (r
elat
ed to
Fig
ure
5)
137
Figure S5. Representation of Histone Modifications at Distal DNase I Hypersensitive Sites
Distal DNase I hypersensitive sites (d-DHS) are split into four categories (intragenic, intergenic,
intragenic CTCF+, intergenic CTCF+). X-axis represents average (across three samples) read
density normalized to total number of reads. Y-axis represents average (across three samples)
fold enrichment of reads relative to flanking non-DHS control regions. H3K79me2 is not
expected to be enriched at DHS; thus it serves as a control test.
138
Figure S6 (related to Figure 6)A
B
139
Figure S6. Allele-Specific Luciferase Reporter Activity of Risk and Nonrisk Haplotypes
(A) IGF2BP2 (P9), KCNQ1 (P21), and FTO (P23) elements in MIN6 and HeLa. Risk and non-
risk alleles are indicated in Table S8. Data are represented as the mean +/- standard deviation of
3 replicates each from at least 2 independent clones for each haplotype.
(B) Novel variant in TCF7L2 element does not alter enhancer activity luciferase activity in
MIN6. Expanded data from Figure 6D, indicating relative luciferase activity of inserts containing
different genotypes at both the undocumented variant (1st nucleotide in the 2 nucleotide pair of
the legend) and rs7903146 (2nd nucleotide of the pair). Only variation at rs7903146 altered
enhancer activity (compare xT with xC) Data are represented as mean +/- standard deviation of 3
replicates each from at least 2 independent clones for each genotype. CC* was generated by site-
directed mutagenesis of the CT element.
140
Tab
le S
6.
Sp
otte
r-D
efin
ed S
earc
h S
pac
e at
18
T2
D-A
ssoc
iate
d L
oci
Sea
rch
Sp
ace
Ind
ex S
NP
C
hr
Pos
itio
n(h
g1
8)
p v
alu
e A
llele
sG
enes
/Lo
ci
Sta
rt
coor
din
ates
(h
g1
8)
End
coor
din
ates
(h
g1
8)
rs10
9239
31
chr1
12
0319
482
0.00
0006
862
G/T
AD
AM
30;N
OTC
H2
1201
4116
4 12
0430
267
rs12
7797
90
chr1
0 12
3680
16
0.00
0047
39
A/G
CAM
K1D
;CD
C12
3;N
UD
T5
1213
5047
12
3776
75
rs11
1187
5 ch
r10
9445
2862
3.
98E-
07
C/T
H
HEX
;KIF
11
9419
0340
94
4895
57
rs79
0314
6 ch
r10
1147
4833
9 3.
05E-
23
C/T
TC
F7L2
11
4707
461
1148
1341
6 rs
2237
892
chr1
1 27
9632
7 0.
0138
7 C/T
CD
KN
1C;K
CN
Q1;
KCN
Q1D
N;S
LC22
A18
;SLC
22A18
AS
2776
329
2815
122
rs52
15
chr1
1 17
3652
06
0.00
0000
41
C/T
ABCC8;
DKFZ
p686
O24
166;
KCN
J11;
NU
CB2
1695
7763
17
3812
87
rs79
6158
1 ch
r12
6994
9369
0.
0000
368
C/T
TS
PAN
8 69
6663
78
6995
3289
rs
8050
136
chr1
6 52
3737
76
0.00
0006
869
C/A
FT
O;R
PGRIP
1L
5235
5409
52
4060
62rs
1770
5177
ch
r17
3319
7639
0.
0028
26
T/A
HN
F1B;L
OC28
4100
3316
4509
33
2055
50
rs75
7859
7 ch
r2
4358
6327
0.
0001
087
T/C
THAD
A
4330
2162
43
9010
34
rs18
0128
2 ch
r3
1236
8125
0.
0002
032
C/G
G
STM
1L;P
PARG
1200
1672
12
8345
00
rs46
0710
3 ch
r3
6468
6944
0.
0003
129
C/T
AD
AM
TS9
6461
7740
64
8045
20
rs44
0296
0 ch
r3
1869
9438
1 7.
54E-
08
G/T
C3o
rf65
;IG
F2BP2
18
6754
224
1870
3137
7 rs
1001
0131
ch
r4
6343
816
0.00
4028
A/G
JA
KM
IP1;
PPP2
R2C
;WFS
1 63
1489
7 63
7598
7 rs
7754
840
chr6
20
7692
29
8.73
E-08
G
/C
CD
KAL1
20
5896
57
2112
0000
rs
8647
45
chr7
28
1470
81
0.00
0046
2 T/
C
JAZF1
2800
6441
28
2257
58
rs13
2666
34
chr8
11
8253
964
0.03
264
C/T
SLC
30A8
1181
1675
8 11
8306
152
rs10
8116
61
chr9
22
1240
94
1.95
E-07
T/
C
CD
KN
2BAS
2193
0588
22
1281
05
Chr
=ch
rom
osom
e
141
Tab
le S
8.
Elem
ents
Con
tain
ing
Typ
e 2
Dia
bet
es-A
ssoc
iate
d S
NP
s
C
lon
ed g
DN
A
Locu
s El
emen
t C
hr
SN
P
CTC
F R
isk
Alle
leN
on-r
isk
Alle
le
CAM
K1D
/CD
C12
3 --
ch
r10
rs11
2576
55Ye
s --
[T]
-- [
C]
FT
O
P23
chr1
6 rs
8050
136
No
A
C
chr1
6 rs
9935
401
No
AG
ch
r16
rs80
5159
1 N
o G
A
IGF2
BP2
P9
ch
r3
rs76
5109
0N
o G
A
ch
r3
rs64
4408
1 N
o C
T
chr3
rs
7646
518
No
CT
ch
r3
rs76
4053
9 N
o A
T
chr3
rs
7637
773
No
AG
KCN
Q1
P21
chr1
1 rs
1631
84a
No
G
T
TCF7
L2
P12
chr1
0 rs
7903
146
No
T C
WFS
1 P1
7 ch
r4
rs38
2194
3N
o T
C
chr4
rs
4689
397
No
AG
-- in
dica
tes
the
elem
ent
was
not
tes
ted
a Thi
s SN
P is
in h
igh
LD (
r2 =0.
98)
with
an
inde
x SN
P (r
s228
3228
; U
noki
et
al.,
200
8) in
the
Eas
t Asi
an p
opul
atio
n (H
apM
ap
JPT+
CH
B)
142
Tab
le S
9.
GW
AS
Cat
alog
SN
Ps
or L
inke
d S
NP
s (r
2>
0.6
) M
app
ing
wit
hin
5
00
bp
of
d-D
HS
D
isea
se/
trai
t C
hro
mos
ome
Ind
ex S
NP
Map
pin
gS
NP
Dp
rim
e r
squ
ared
R
epo
rted
gen
e C
TCF?
Type
1 d
iabe
tes
chr1
0 rs
1050
9540
rs11
8168
65
0.91
471
0.66
791
C10
orf5
9
Type
1 d
iabe
tes
chr4
rs
1051
7086
rs10
5170
86
1 1
Inte
rgen
ic
Ty
pe 1
dia
bete
s ch
r5
rs14
4589
8 rs
1737
6481
1
0.61
241
CAPS
L
Type
1 d
iabe
tes
chr1
4 rs
1465
788
rs19
4749
1
0.83
351
Inte
rgen
ic
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
7403
919
0.91
747
0.80
356
KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
9935
174
0.91
126
0.67
6 KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
1003
603
0.91
258
0.67
838
KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
7256
13
1 0.
9599
1 KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
1292
5642
1
0.72
209
KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
9929
994
1 0.
9593
2 KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
1292
4729
0.
9479
1 0.
6883
6 KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
1291
7656
0.
9506
1 0.
7364
9 KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
7203
459
0.89
148
0.61
388
KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
2903
692
1 1
KIA
A03
50
Ty
pe 1
dia
bete
s ch
r16
rs29
0369
2 rs
1767
3553
1
0.70
379
KIA
A03
50
Ty
pe 1
dia
bete
s ch
r19
rs42
5105
rs
4251
05
1 1
Inte
rgen
ic
Ty
pe 1
dia
bete
s ch
r19
rs42
5105
rs
1605
44
1 0.
7020
9 In
terg
enic
Type
1 d
iabe
tes
chr1
6 rs
4788
084
rs26
5049
2 0.
9115
6 0.
6153
4 IL
27
Ty
pe 1
dia
bete
s ch
r22
rs57
5303
7 rs
4117
6 0.
9661
9 0.
9329
2 In
terg
enic
Type
1 d
iabe
tes
chr5
rs
6897
932
rs93
1555
0.
8435
2 0.
6577
6 IL
7R
Ty
pe 1
dia
bete
s ch
r16
rs72
0287
7 rs
1333
1385
1
1 In
terg
enic
Type
1 d
iabe
tes
chr1
6 rs
7202
877
rs11
1498
12
0.92
395
0.64
991
Inte
rgen
ic
Ty
pe 1
dia
bete
s ch
r16
rs72
0287
7 rs
4993
971
0.92
395
0.64
991
Inte
rgen
ic
Ty
pe 1
dia
bete
s ch
r6
rs92
6864
5 rs
4547
48
0.80
972
0.63
434
MH
C
Yes
Type
1 d
iabe
tes
chr6
rs
9268
645
rs92
6852
8 0.
9636
0.
9283
7 M
HC
Yes
Type
1 d
iabe
tes
chr6
rs
9268
645
rs92
6860
5 1
1 M
HC
Ty
pe 1
dia
bete
s ch
r6
rs92
6864
5 rs
9268
606
1 1
MH
C
Yes
143
Type
1 d
iabe
tes
chr6
rs
9268
645
rs92
6860
7 1
1 M
HC
Ty
pe 1
dia
bete
s ch
r21
rs99
7676
7 rs
7276
630
1 0.
7981
U
BASH
3A
Ty
pe 1
dia
bete
s ch
r21
rs99
7676
7 rs
7278
547
1 0.
9647
6 U
BASH
3A
Ty
pe 2
dia
bete
s ch
r12
rs12
3049
21rs
1712
5346
0.
9284
7 0.
6707
5 N
R
Ty
pe 2
dia
bete
s ch
r10
rs12
7797
90rs
1125
7655
1
0.83
393
CD
C12
3,CAM
K1D
Yes
Type
2 d
iabe
tes
chr1
1 rs
2237
897
rs81
8158
8 1
0.70
186
KCN
Q1
Ty
pe 2
dia
bete
s ch
r11
rs22
3789
7 rs
2237
896
1 1
KCN
Q1
Ty
pe 2
dia
bete
s ch
r11
rs22
3789
7 rs
2237
897
1 1
KCN
Q1
Ty
pe 2
dia
bete
s ch
r3
rs44
0296
0 rs
6444
081
1 1
IGF2
BP2
Type
2 d
iabe
tes
chr3
rs
4402
960
rs76
4651
8 1
1 IG
F2BP2
Type
2 d
iabe
tes
chr4
rs
4689
388
rs46
8939
7 0.
9571
8 0.
8415
9 W
FS1,
PPP
2R2C
Type
2 d
iabe
tes
chr4
rs
4689
388
rs38
2194
3 0.
9572
2 0.
8440
5 W
FS1,
PPP
2R2C
Type
2 d
iabe
tes
chr2
rs
7578
597
rs17
0310
79
1 0.
8095
2 TH
AD
A
Ty
pe 2
dia
bete
s ch
r2
rs75
7859
7 rs
7559
723
1 0.
8181
8 TH
AD
A
Ty
pe 2
dia
bete
s ch
r2
rs75
7859
7 rs
1018
6307
1
0.81
818
THAD
A
Yes
Type
2 d
iabe
tes
chr2
rs
7578
597
rs10
1864
41
1 0.
8181
8 TH
AD
A
Yes
Type
2 d
iabe
tes
chr2
rs
7578
597
rs17
0311
33
1 0.
8181
8 TH
AD
A
Ty
pe 2
dia
bete
s ch
r2
rs75
7859
7 rs
6749
617
1 0.
7482
5 TH
AD
A
Ty
pe 2
dia
bete
s ch
r10
rs79
0314
6 rs
7903
146
1 1
TCF7
L2
Ty
pe 2
dia
bete
s ch
r16
rs80
5013
6 rs
1781
7288
1
0.64
722
FTO
Type
2 d
iabe
tes
chr1
6 rs
8050
136
rs11
0759
87
1 0.
6047
4 FT
O
Ty
pe 2
dia
bete
s ch
r6
rs94
7213
8 rs
9462
935
0.88
98
0.62
484
VEG
FA
Fa
stin
g pl
asm
a gl
ucos
e ch
r11
rs21
6670
6 rs
1083
0956
1
0.65
589
MTN
R1B
144
Tab
le S
10
. Is
let
Don
or C
har
acte
rist
ics
Sam
ple
ID
S
ex
Pu
rity
a
(%)
Via
bili
tya
(%)
BM
I A
ge
Cau
se o
f D
eath
b
Rac
ecIs
olat
ion
Sit
ed
Ap
pro
xim
ate
amou
nt
of
cros
slin
ked
mat
eria
l (I
slet
eq
uiv
alen
ts)e
Isle
t 1
F
60
91
25.2
58
CVH
AA
UM
N
1600
0 Is
let
2
F 60
99
30
.1
41
CVH
H
U
MN
16
000
Isle
t 3
M
70
93
26
.5
16
BH
T C
UAB
1800
0Is
let
4
F 80
97
24
37
U
C
U.
Ill.
1800
0 Is
let
5
M
85
95
24.7
36
SIG
SW
H
C
U.
Ill.
1600
0 Is
let
6
M
90
95
27.9
60
CVH
C
U.
Mia
mi
1600
0 Is
let
7
M
90
95
29.2
27
IC
H
H
Nor
thw
este
rnU
1400
0Is
let
8
F 80
90
22
.5
56
CV
A
C
ND
RI
1400
0Is
let
9
M
90
80
23.5
28
H
ead
Trau
ma
C
ND
RI
1400
0
a P
urity
(as
sess
ed b
y di
thiz
one
stai
ning
) an
d vi
abili
ty w
ere
dete
rmin
ed b
y is
let
dist
ribu
tion
cent
ers
b CVH
=Cer
ebro
vasc
ular
hem
orrh
age;
BH
T=bl
unt
head
tra
uma;
SIG
SW
H=
Sel
f-in
flict
ed g
unsh
ot w
ound
to
the
head
; I
CH
=In
trac
ereb
ral h
emor
rhag
e;CVA
=ce
rebr
ovas
cula
r ac
cide
nt (
stro
ke);
U=
undo
cum
ente
d/un
know
n
c AA=
Afr
ican
Am
eric
an;
H=
His
pani
c; C
=Cau
casi
an
d UM
N=
Uni
vers
ity o
f M
inne
sota
; U
AB=
Uni
vers
ity o
f Ala
bam
a-Birm
ingh
am;
U.I
ll.=
Uni
vers
ity o
f Illin
ois;
N
DRI=
Nat
iona
l Dis
ease
Res
earc
h In
terc
hang
e
e
1 is
let
equi
vale
nt =
~1,
000
cells
145
Tab
le S
11
. D
Nas
e-S
equ
enci
ng
Dep
th
Sam
ple
R
aw r
ead
s A
lign
ed
read
s A
fter
Rem
ove
Bla
cklis
ted
Un
iqu
e st
art
(6
p
ileu
p*
) U
niq
ue
star
t Is
let
7
2111
7592
14
7686
03
1469
9522
14
6621
61
9823
781
Isle
t 8
29
4979
84
2066
8595
20
5962
94
2042
4646
19
1906
93
Isle
t 9
52
7329
17
3782
8388
37
5986
62
3659
9434
14
9900
45
Bla
cklis
ted
regi
ons
incl
ude
repe
at r
egio
ns a
nd o
ther
in t
he U
CSC G
enom
e Bro
wse
r "D
uke
Excl
uded
Reg
ions
" Tr
ack
* U
p to
6 r
eads
with
the
sam
e 5'
end
are
incl
uded
in t
he a
naly
sis
base
d up
on t
he m
echa
nism
of
DN
ase
actio
n an
d si
mul
atio
ns
146
Tab
le S
12
. C
hIP
-Seq
Ep
itop
es U
sed
an
d S
equ
enci
ng
Dep
th
Sam
ple
Ep
itop
e/m
odif
icat
ion
Lan
esTo
tal
clu
ster
s A
lign
ed
read
s
No
Sat
ellit
e re
ads
Un
iqu
e st
arts
MA
CS
pea
ks
Isle
t 1
K
4m
e3
3 41
,722
,019
25
,032
,605
24
,137
,180
23
,127
,664
33
,260
Is
let
2
Inp
ut
3 49
,110
,219
31
,575
,093
31
,157
,610
30
,772
,095
Isle
t 2
G
FP4
67,2
00,0
31
35,9
28,7
43
35,3
14,1
45
17,3
29,1
33
1,80
4
Isle
t 2
K
4m
e3,
15
cyc
les
son
icat
ion
3
41,6
66,3
68
25,2
77,0
07
24,3
70,8
11
23,0
20,6
86
37,5
85
Isle
t 2
K
4m
e3,
20
cyc
les
son
icat
ion
3
25,6
21,4
75
15,5
30,3
84
14,8
49,0
23
14,0
22,6
98
31,5
52
Isle
t 3
In
pu
t 2
5,98
1,17
2 4,
183,
013
4,12
1,25
3 4,
077,
901
Is
let
3
GFP
1 8,
969,
443
4,24
0,82
4 4,
173,
037
859,
037
1,02
9 Is
let
3
K4
me1
2
26,3
23,7
90
20,8
53,7
61
20,7
06,1
46
20,4
64,3
48
21,6
35
Isle
t 3
K
79
me2
2
39,9
27,5
95
29,5
39,7
22
29,5
00,2
14
20,0
43,9
34
Is
let
4
Inp
ut
3 42
,697
,192
28
,006
,212
27
,476
,408
24
,770
,076
Isle
t 4
K
4m
e1
3 37
,340
,229
24
,802
,474
24
,505
,805
21
,085
,480
14
,305
Is
let
4
K4
me3
3
45,8
74,7
19
27,4
61,3
99
27,0
02,1
56
5,25
4,28
2 20
,066
Is
let
4
K7
9m
e2
3 41
,408
,208
28
,346
,402
28
,107
,409
21
,842
,999
Isle
t 5
In
pu
t 3
44,6
19,6
30
29,1
14,3
60
28,7
68,4
78
27,1
35,9
40
Is
let
5
GFP
1 7,
206,
897
3,99
9,18
8 3,
956,
623
2,32
0,07
9 2,
743
Isle
t 5
K
4m
e1
2 24
,941
,609
17
,090
,860
16
,961
,981
16
,821
,916
13
,926
Is
let
5
K4
me3
1
9,25
1,86
2 7,
437,
994
7,41
0,84
8 6,
436,
074
21,8
50
Isle
t 5
K
79
me2
2
24,2
78,8
95
17,3
37,9
70
4,54
3,54
6 4,
483,
407
Is
let
6
Inp
ut
2 33
,391
,762
21
,077
,573
20
,776
,285
19
,917
,514
Isle
t 6
G
FP1
23,5
42,6
87
14,4
17,6
52
14,1
65,7
43
3,02
0,96
5 8,
643
Isle
t 6
C
TCF
(Ab
Cam
) 1
10,5
99,3
76
7,68
4,57
3 7,
686,
285
6,32
7,93
4 37
,873
Is
let
6
CTC
F (M
illip
ore)
2
31,4
11,2
07
19,4
72,2
73
19,4
77,0
43
4,18
6,98
6 25
,778
147
148
Discussion
Future Directions
Conclusion
149
150
DISCUSSION Common polygenic diseases like T2D turn out to have a complex genetic architecture, with a very large number of risk variants, but only a very modest contribution from each. Given that reality, family linkage studies turned out to lack significant power to discover any but the strongest factors associated with T2D -- such as TCF7L2 (Grant et al., 2006). GWAS studies and expansive sampling of populations in meta-analysis studies have provided much greater power, and have been overwhelmingly successful at identifying increasing numbers of loci associated with T2D and T2D related traits. Despite these intense efforts, however, our functional understanding remains quite limited. First of all, it is challenging to identify the variant responsible for the functional consequence leading to T2D. In fact, due to the structure of the linkage disequilibrium at this level of resolution it is difficult to determine which gene at the associated locus is responsible for T2D. More than that, the possibility that a risk allele is actually affecting expression of a more distant gene that falls outside the region of linkage disequilibrium has to be seriously considered. Before tackling the functional challenge, however, it is important to outline additional approaches that are being taken to fill out the catalog of risk alleles for T2D and related traits. High resolution genetic mapping with increasing power: Given the remarkable success of GWAS approaches to catalog the wide array of genomic variants that contribute to T2D disease risk, it was desirable to increase the sample size even further. To achieve this, member of several consortia combined results to generate the custom designed Metabochip, containing almost 200,000 SNPs to fine map 257 loci of association defined by GWAS meta-analyses of 23 traits (Voight et al., 2012).
151
Table 1: Metabochip SNP selection
This custom genotyping array was made affordable in order for many studies to genotype hundreds of thousands of samples. The DIAGRAM consortium combined meta-analysis of their 12,171 T2D cases and 56,862 controls (imputed to 2.5 million autosomal SNPs) with an additional 22,669 T2D cases and 58,119 controls genotyped on the Metabochip. The result was identification of 10 additional T2D associate loci (Dimas et al., 2014). The Metabochip was also employed in the MAGIC Consortium to examine fasting glucose, fasting insulin and two-hour glucose in combined meta-analyses of 133,010, 108,557 and 42,854 subjects respectively. These analyses identified 53 loci associated with glycemic traits, 33 of which were also associated with T2D supporting the greater contribution of fasting glucose to T2D associated genes (R. A. Scott et al., 2012). Rare variant detection: GWAS strategies are generally limited to detection of loci with minor allele frequency of at least 1 – 2%. But there has been some suspicion that significant parts of the “missing heritability” for T2D and other common complex diseases might be due to rare alleles of large effect. Two large consortia have been formed to explore various sequencing strategies to identify such variants for T2D. The Type 2 Diabetes Genetic Exploration by Next-generation
152
sequencing in multi-Ethnic Samples (T2D-GENES) consortium is performing whole exome sequencing in five populations as well as whole genome sequencing in large multigenerational Mexican-American families. The Genetics of T2D (GoT2D) consortium is combining low coverage whole genome sequencing with deep coverage exome sequencing, high density genotyping and genotype imputation data on ~3000 European subjects to identify additional loci associated with T2D. These studies contributed preliminary sequencing data to the design of an economical exome based genotyping array to test rare variant associations in larger populations. (Grove et al., 2013) The goal of the Exome Chip design is to investigate rare and potentially more deleterious coding variations that affect protein structure, splicing and nonsense variants. Exome sequencing data for ~12,000 subjects assembled from 16 studies (table 2) was compiled for discovery of SNPs to the level of single observations. Table 2: Exome Sequencing study contribution to the Exome Chip content
From this data it was determined that the average genome can be expected to contain 8,000-10,000 nonsynonymous variants, 200-300 splice variants and 80-100 nonsense variants (‘Exome Chip Design - Genome Analysis Wiki’, http://genome.sph.umich.edu/wiki/ Exome_Chip_Design). Deep sequencing confirms common variants and identifies rare variants relevant to T2D Whole genome sequencing on a large cohort is not yet economically feasible. The information gained from this effort, while comprehensive, would present considerable challenges for interpretation, since rare non-coding variants will be found in every individual, but determining their functional significance can be extremely challenging. Targeted exonic and whole exome sequencing is significantly more affordable, and likely to discover coding variants (missense, nonsense, frameshift) that are much easier to interpret. The GCKR gene encoding the glucokinase regulatory protein (GKRP) harbors the common P446L variant (MAF=0.34) associated with increased triglyceride levels, C-reactive peptide and lower fasting glucose (Orho-Melander et al., 2008). GCKR was a candidate gene subjected to targeted exonic Sanger sequencing in the ClinSeq project where 19 rare variants were identified with a MAF < 0.02, most of which were novel (Biesecker et al., 2009). In vitro
153
examination demonstrated the spectrum of effects of these variants on GKRP ranged from loss of function, wild-type, to gain of function (Rees et al., 2012). These results emphasize the value of functional assays for variants effect on gene function. Functional analyses of rare variants detected in the candidate gene PPARG that inhibit adipocyte differentiation are associated with increased risk for T2D. In a large scale sequence analysis of PPARG in T2D cases and controls 49 rare nonsynonymous variants (MAF < 0.5%) were discovered. Only the common P12A variant was found at frequency > 1%. When these rare variants were evaluated in in vitro adipose differentiation assays 9 of these variants that demonstrated inhibition of the differentiation pathway were significantly associated with T2D risk (Majithia et al., 2014). Similarly, SLC30A8, which contains the common W325R variant associated with T2D risk, was subjected to targeted exon sequencing in a study of 115 genes near T2D signals, where 12 loss-of-function variants were detected that were shown to protect against diabetes (Flannick et al., 2014). It is important to note that although these rare loss-of-function events do not account for significant contribution to population prevalence of T2D the fact that suppression of SLC30A8 activity protects against diabetes indicates that this would be an excellent drug target. The genomics approach to functional annotation of disease variants As described above, it is notable that the vast majority of the T2D associated variants are not located in the coding regions of genes, but rather in the intergenic and intronic regions. This suggests that variants in regulatory elements, such as promoters or enhancers, affect the regulation of T2D associated genes and lead to disease susceptibility. To seek to determine the functional basis of non-coding T2D variants, we embarked on a study of chromatin structural analysis as a surrogate model of gene regulation. Using primary human pancreatic islets isolated from transplant candidates as a platform for understanding the regulation of gene expression in targets of T2D pathogenesis, we performed DNase hypersensitive site analysis (DNaseHS), as well as CTCF binding (to identify insulators) and histone H3 modification analysis by chromatin immunoprecipitation (ChIP) (Schmid & Bucher, 2007). Genome-wide, we have identified ~18,000 putative promoters identified by histone-3-lysine-4-trimethylation (H3K4me3), some of which were previously unannotated and active only in pancreatic islet cells. In addition we have identified 34,039 non-promoter regulatory elements, of which 22% are bound by CTCF as putative insulators and 47% are unique to pancreatic islets in comparison with other published studies (Ernst et al., 2011). For 18 T2D associated loci identified in the meta-analysis of the combined GWAS, we identified 118 putative regulatory elements in the neighborhood of those loci, and confirmed enhancer activity in 12 of 33 elements by in vitro luciferase assay and transgenic reporter mice (Stitzel et al., 2010). These putative regulatory elements are now being examined for correlation with gene expression in pancreatic islets. The goal is to identify a connection between the risk alleles and nearby differential gene expression as expression quantitative trait loci (eQTLs) (Battle & Montgomery, 2014).
154
FUTURE DIRECTIONS: The importance of functional analyses in disease relevant tissues Most variants associated with T2D are found in the non-coding regulatory regions of the genome. While the GWAS variants identify these loci they rarely are the actual functional variant but rather identify the haplotype where the true functional variant resides. To identify how T2D risk variants functionally contribute to disease it is critical to integrate all genetic and functionally relevant genomic data for the associated loci. To achieve this it is important to define genome wide epigenomic landscape in relevant tissues. Active parts of the genome are identified by open chromatin that has been traditionally detected by DNase hypersensitivity (DNase HS) assays. Further refinement of regulatory regions is defined by the combination of histone modifications present, which can indicate the location of promoters, enhancers and transcriptionally active genes as well as transcriptionally repressed genes. High resolution genotyping of the subject or tissue derived from the subject, sufficient to allow for accurate imputation, enables the determination of which T2D variants are found within the functional elements defined by the epigenomic landscape. RNA sequencing to sufficient depth to explore gene expression, transcript isoform deconvolution and allele-specific expression allows evaluation of T2D risk allele correlation with an eQTL in the region. All variants of the risk haplotype can be examined for interference with transcription factor binding both computationally and experimentally. Examination of the risk allele effect on gene expression can indicate the potential of the gene as a therapeutic target. Assessment of variants in the context of the chromatin structure of tissues specifically implicated in T2D; pancreatic islets, liver, skeletal muscle, and adipose where tissue specific enhancers have been discovered by chromatin immunoprecipitation experiments in conjunction with expression QTL analysis leads to identification of target genes in associated regions. To complement the epigenomic analysis of pancreatic islets we continue to collect islet tissues and genotype them on Illumina arrays containing 2.5 million SNPs to ascertain their load of T2D and T2D related alleles. Total RNA is extracted and strand-specific RNA sequencing (RNA-Seq) is performed to a depth of 100 million paired-end reads, sufficient to attempt transcript isoform deconvolution to investigate differences in tissue specific transcript representation in islets. The combination of genotype and gene expression enables gene expression quantitative trait loci (eQTL) analysis associated with T2D and T2D related trait associated alleles. We will also examine T2D associated variants with alternative splicing quantitative trait loci (sQTLs). Recently we have also been performing the assay for transposase-accessible chromatin using sequencing (ATAC-seq) to assess open chromatin structure similar to DNaseHS analysis, but with sufficient resolution to identify transcription factor footprints (Buenrostro, Giresi, Zaba, Chang, & Greenleaf, 2013). This will be sufficient to employ phylogenetic module complexity analysis, PMCA, (Claussnitzer et al., 2014) where conserved co-ocurring transcription factor binding sites (TFBS) are identified in several species are examined to systematically identify cis-regulatory variants at GWAS loci (Figure 1).
155
Figure 1. Allelic and cross-species chromatin state signatures at the PDX1 locus. (A) Tn5 transposase (green) inserts sequence adaptors (red and blue) in regions of open chromatin (between nucleosomes in gray) to generate ATAC-seq libraries. Schematic taken from Buenrostro et al. (B) UCSC genome browser view of the human PDX1 locus showing chromatin state maps for 31 cell types and transcription maps for 3 cell types (other cell types lack gene expression and appear similar to HepG2 and GM12878). EndoC and islet chromatin state maps are similar to each other but remarkably different from other cell-types, indicating the cell-type specificity of this locus. Note the fasting glucose-related trait GWAS SNP in the stretch enhancer specific to EndoC and proximate to the stretch enhancer in islets. Nearby the GWAS SNP, there are two SNPs (circled) with significant allelic bias in EndoC H3K27ac ChIP-seq data. (C) H3K27Ac allelic bias in EndoC. The circled points represent two SNPs in the PDX1 locus. P-values are based on a Binomial test with an expectation of 0.5. The highly symmetric pattern around the vertical line at 0.5 indicates that our allelic bias pipeline accounts for reference bias. (D) Chromatin state and transcription maps of PDX1 in mouse insulinoma cells (MIN6). The similarity of chromatin and expression maps between MIN6 and EndoC/islets suggests that cross-species ATAC-seq maps could identify important TF binding modules.
Allele-specific expression quantitative trait loci (aseQTL) will be evaluated where transcribed SNPs (tSNPs) are present in the RNA transcript in a region a putative regulatory SNP (rSNP) is located. This can be evaluated where phase is not known (Battle et al., 2014) or where phase can be assessed (Lappalainen et al., 2013). Pancreatic islet tissue is not accessible in living individuals, but other tissues relevant to diabetes can be studied in vivo. In order to address the translation of the genetic association with T2D to the functional cause of the disease, we have begun a study of the integrated analysis of genotype, gene expression and phenotype on the genetic background of subjects
156
from the FUSION and Metabolic Syndrome in Man (METSIM) study in Finland (Stancakova et al., 2009). We have sampled skeletal muscle and subcutaneous adipose tissue from 324 Finnish individuals, including 125 normal glucose tolerant (NGT), 41 impaired fasting glucose (IFG), 72 impaired glucose tolerant (IGT) and 86 newly diagnosed T2D subjects. All have had their disease status confirmed by oral glucose tolerance test prior to the biopsy. RNA sequencing (RNA-Seq) and microRNA sequencing (miRNA-Seq) are being performed on total RNA extracted from these tissues to document gene expression. All subjects are being genotyped on high density arrays including all SNPs previously associated with T2D, in order to evaluate eQTLs, sQTLs and aseQTLs as previously described. (Figure 2) In addition, DNA methylation is planned to look for association with disease or quantitative traits. All of these subjects already have extensive phenotype information from study records.
Figure 2. T2D associated eQTL in muscle RNA-Seq expression analysis. SNP association with gene expression identifies probable gene associated with T2D. POU5F1 and TCF19 genes are identified as candidates for the association by proximity to the T2D association signal . Association with gene expression indicates eQTL with CCHCR1 suggesting this may be a more plausible candidate. In the longer term, we hope to study other tissues derived in vitro from these same individuals, taking advantage of recent scientific developments. Primary patient fibroblast cell lines are also being established to use for induced pluripotent stem (iPS) cell line generation, to investigate the effects of different genetic backgrounds on development of relevant tissues by differentiation of the pluripotent lines toward adipocyte, muscle cell, hepatocyte and pancreatic beta cell lineages (Takahashi et al., 2007).
157
These studies may represent important strides forward to investigate interaction of the genetic background with the functional consequences, and should assist in identifying the most promising therapeutic targets. CONCLUSION The increase in the incidence of T2D throughout the world compels the need to understand the disease etiology better, to develop strategies that might slow the trend of increasing incidence of T2D, and to identify new therapeutic approaches. Progress in the last seven years has been breathtaking, as GWAS studies of common variants have contributed significantly to identifying a host of candidate susceptibility loci for T2D. Increasing study subject size and sensitivity for less common alleles has allowed the identification of additional variants that contribute to the heritability of T2D. But the functional understanding of these variants, and the translation of those insights into therapeutic opportunities, presents the most significant current challenge. With the tools now being developed and applied, there is no question that this challenge will be met.
158
Acknowledgements
159
160
ACKNOWLEDGEMENTS
Foremost I would like to thank my mentors, Cisca Wijmenga, Marten Hofker and Francis Collins for this opportunity and their support and encouragement along the way. I would like to extend my heartfelt appreciation to Francis for his unending support and guidance throughout my career at NHGRI. I would also like to acknowledge the many members of Francis’s lab, past and present, that have shared their creativity and continue to do so. In addition, I wish to acknowledge the FUSION group in its entirety from Helsinki, Finland to NHGRI to UNC to USC and Cedar Sinai and finally to the University of Michigan for their camaraderie and interaction contributing to this work and for engaging the international collaborations leading to the DIAGRAM and MAGIC consortia and the world effort to understand and challenge T2D.
161
162
References
163
164
REFERENCES
Battle, A., & Montgomery, S. B. (2014). Determining causality and consequence of expression quantitative trait loci.
Human Genetics, 133(6), 727–735. doi:10.1007/s00439-014-1446-0
Battle, A., Mostafavi, S., Zhu, X., Potash, J. B., Weissman, M. M., McCormick, C., … Koller, D. (2014).
Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.
Genome Research, 24(1), 14–24. doi:10.1101/gr.155192.113
Biesecker, L. G., Mullikin, J. C., Facio, F. M., Turner, C., Cherukuri, P. F., Blakesley, R. W., … Green, E. D. (2009).
The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine. Genome
Research, 19(9), 1665–1674. doi:10.1101/gr.092841.109
Bonnycastle, L. L. (2006). Common Variants in Maturity-Onset Diabetes of the Young Genes Contribute to Risk of
Type 2 Diabetes in Finns. Diabetes, 55(9), 2534–2540. doi:10.2337/db06-0178
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y., & Greenleaf, W. J. (2013). Transposition of native
chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome
position. Nature Methods, 10(12), 1213–1218. doi:10.1038/nmeth.2688
Burton, P. R., Clayton, D. G., Cardon, L. R., Craddock, N., Deloukas, P., Duncanson, A., … Worthington, J. (2007b).
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature,
447(7145), 661–678. doi:10.1038/nature05911
CDC - National Diabetes Statistics Report, 2014 - Publications - Diabetes DDT. (n.d.). Retrieved 6 September 2014,
from http://www.cdc.gov//diabetes/pubs/statsreport14.htm
Claussnitzer, M., Dankel, S. N., Klocke, B., Grallert, H., Glunk, V., Berulava, T., … Laumen, H. (2014). Leveraging
Cross-Species Transcription Factor Binding Site Patterns: From Diabetes Risk Loci to Disease Mechanisms.
Cell, 156(1-2), 343–358. doi:10.1016/j.cell.2013.10.058
DeFronzo, R. A. (2009). From the Triumvirate to the Ominous Octet: A New Paradigm for the Treatment of Type 2
Diabetes Mellitus. Diabetes, 58(4), 773–795. doi:10.2337/db09-9028
Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of
BioMedical Research, Saxena, R., Voight, B. F., Lyssenko, V., Burtt, N. P., de Bakker, P. I. W., … Purcell, S.
(2007a). Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels.
Science, 316(5829), 1331–1336. doi:10.1126/science.1142358
165
Dimas, A. S., Lagou, V., Barker, A., Knowles, J. W., Magi, R., Hivert, M.-F., … on behalf of the MAGIC
Investigators. (2014). Impact of Type 2 Diabetes Susceptibility Variants on Quantitative Glycemic Traits
Reveals Mechanistic Heterogeneity. Diabetes, 63(6), 2158–2171. doi:10.2337/db13-0949
Ernst, J., Kheradpour, P., Mikkelsen, T. S., Shoresh, N., Ward, L. D., Epstein, C. B., … Bernstein, B. E. (2011).
Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, 473(7345), 43–49.
doi:10.1038/nature09906
Exome Chip Design - Genome Analysis Wiki. (n.d.). Retrieved 7 September 2014, from
http://genome.sph.umich.edu/wiki/Exome_Chip_Design
Flannick, J., Thorleifsson, G., Beer, N. L., Jacobs, S. B. R., Grarup, N., Burtt, N. P., … Altshuler, D. (2014). Loss-of-
function mutations in SLC30A8 protect against type 2 diabetes. Nature Genetics, 46(4), 357–363.
doi:10.1038/ng.2915
Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., … Stewart, J. (2007). A second
generation human haplotype map of over 3.1 million SNPs. Nature, 449(7164), 851–861.
doi:10.1038/nature06258
Ghosh, S., Watanabe, R. M., Hauser, E. R., Valle, T., Magnuson, V. L., Erdos, M. R., … Kohtamaki, K. (1999). Type
2 diabetes: evidence for linkage on chromosome 20 in 716 Finnish affected sib pairs. Proceedings of the
National Academy of Sciences, 96(5), 2198–2203. Retrieved from http://www.pnas.org/content/96/5/2198.short
Gibbs, R. A., Belmont, J. W., Hardenbol, P., Willis, T. D., Yu, F., Yang, H., … others. (2003). The international
HapMap project. Nature, 426(6968), 789–796. Retrieved from
http://www.nature.com/nature/journal/v426/n6968/abs/nature02168.html
Grant, S. F. A., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., … Stefansson, K.
(2006). Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genetics,
38(3), 320–323. doi:10.1038/ng1732
Grarup, N., Sandholt, C. H., Hansen, T., & Pedersen, O. (2014). Genetic susceptibility to type 2 diabetes and obesity:
from genome-wide association studies to rare variants and beyond. Diabetologia. doi:10.1007/s00125-014-
3270-4
Grove, M. L., Yu, B., Cochran, B. J., Haritunians, T., Bis, J. C., Taylor, K. D., … Boerwinkle, E. (2013). Best
Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium. PLoS ONE, 8(7),
e68095. doi:10.1371/journal.pone.0068095
166
Gusella, J. F., Wexler, N. S., Conneally, P. M., Naylor, S. L., Anderson, M. A., Tanzi, R. E., … Sakaguchi, A. Y.
(1983). A polymorphic DNA marker genetically linked to Huntington’s disease. Nature, 306(5940), 234–238.
Kahn, S. E., Cooper, M. E., & Del Prato, S. (2014). Pathophysiology and treatment of type 2 diabetes: perspectives on
the past, present, and future. The Lancet, 383(9922), 1068–1083. Retrieved from
http://www.sciencedirect.com/science/article/pii/S0140673613621546
Kaprio, J., Tuomilehto, J., Koskenvuo, M., Romanov, K., Reunanen, A., Eriksson, J., … Kesäniemi, Y. A. (1992).
Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a
population-based cohort of twins in Finland. Diabetologia, 35(11), 1060–1067. Retrieved from
http://link.springer.com/article/10.1007/BF02221682
Lappalainen, T., Sammeth, M., Friedländer, M. R., ‘t Hoen, P. A. C., Monlong, J., Rivas, M. A., … Dermitzakis, E. T.
(2013). Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 501(7468),
506–511. doi:10.1038/nature12531
Majithia, A. R., Flannick, J., Shahinian, P., Guo, M., Bray, M.-A., Fontanillas, P., … Zollner, S. (2014). Rare variants
in PPARG with decreased activity in adipocyte differentiation are associated with increased risk of type 2
diabetes. Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1410428111
Orho-Melander, M., Melander, O., Guiducci, C., Perez-Martinez, P., Corella, D., Roos, C., … Kathiresan, S. (2008).
Common Missense Variant in the Glucokinase Regulatory Protein Gene Is Associated With Increased Plasma
Triglyceride and C-Reactive Protein but Lower Fasting Glucose Concentrations. Diabetes, 57(11), 3112–3121.
doi:10.2337/db08-0516
Rees, M. G., Ng, D., Ruppert, S., Turner, C., Beer, N. L., Swift, A. J., … Collins, F. S. (2012). Correlation of rare
coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and
kinetic outcomes. Journal of Clinical Investigation, 122(1), 205–217. doi:10.1172/JCI46425
Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol, J. M., Stein, L. D., Marth, G., … International SNP Map
Working Group. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide
polymorphisms. Nature, 409(6822), 928–933. doi:10.1038/35057149
Schaid, D. J., & Sommer, S. S. (1993). Genotype relative risks: methods for design and analysis of candidate-gene
association studies. American Journal of Human Genetics, 53(5), 1114. Retrieved from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1682319/
Schmid, C. D., & Bucher, P. (2007). ChIP-Seq data reveal nucleosome architecture of human promoters. Cell, 131(5),
831–832.
167
Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., … Boehnke, M. (2007). A Genome-
Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants. Science,
316(5829), 1341–1345. doi:10.1126/science.1142382
Scott, R. A., Lagou, V., Welch, R. P., Wheeler, E., Montasser, M. E., Luan, J., … Barroso, I. (2012). Large-scale
association analyses identify new loci influencing glycemic traits and provide insight into the underlying
biological pathways. Nature Genetics, 44(9), 991–1005. doi:10.1038/ng.2385
Silander, K., Scott, L. J., Valle, T. T., Mohlke, K. L., Stringham, H. M., Wiles, K. R., … Boehnke, M. (2004). A large
set of Finnish affected sibling pair families with type 2 diabetes suggests susceptibility loci on chromosomes 6,
11, and 14. Diabetes, 53(3), 821–829.
Spencer, C., Hechter, E., Vukcevic, D., & Donnelly, P. (2011). Quantifying the Underestimation of Relative Risks
from Genome-Wide Association Studies. PLoS Genetics, 7(3), e1001337. doi:10.1371/journal.pgen.1001337
Stancakova, A., Javorsky, M., Kuulasmaa, T., Haffner, S. M., Kuusisto, J., & Laakso, M. (2009). Changes in Insulin
Sensitivity and Insulin Release in Relation to Glycemia and Glucose Tolerance in 6,414 Finnish Men. Diabetes,
58(5), 1212–1221. doi:10.2337/db08-1607
Stitzel, M. L., Sethupathy, P., Pearson, D. S., Chines, P. S., Song, L., Erdos, M. R., … Collins, F. S. (2010). Global
Epigenomic Analysis of Primary Human Pancreatic Islets Provides Insights into Type 2 Diabetes Susceptibility
Loci. Cell Metabolism, 12(5), 443–455. doi:10.1016/j.cmet.2010.09.012
Stumvoll, M., Goldstein, B. J., & van Haeften, T. W. (2005). Type 2 diabetes: principles of pathogenesis and therapy.
The Lancet, 365(9467), 1333–1346. doi:10.1016/S0140-6736(05)61032-X
Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., & Yamanaka, S. (2007). Induction of
Pluripotent Stem Cells from Adult Human Fibroblasts by Defined Factors. Cell, 131(5), 861–872.
doi:10.1016/j.cell.2007.11.019
The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 437(7063), 1299–
1320. doi:10.1038/nature04226
Thorisson, G. A., & Stein, L. D. (2003). The SNP Consortium website: past, present and future. Nucleic Acids
Research, 31(1), 124–127. doi:10.1093/nar/gkg052
Tsui, L.-C., Buchwald, M., Barker, D., Braman, J. C., Knowlton, R., Schumm, J. W., … others. (1985). Cystic fibrosis
locus defined by a genetically linked polymorphic DNA marker. Science, 230(4729), 1054–1057. Retrieved
from http://www.sciencemag.org/content/230/4729/1054.short
168
Tuomi, T., Santoro, N., Caprio, S., Cai, M., Weng, J., & Groop, L. (2014). The many faces of diabetes: a disease with
increasing heterogeneity. The Lancet, 383(9922), 1084–1094. Retrieved from
http://www.sciencedirect.com/science/article/pii/S0140673613622199
Valle, T., Tuomilehto, J., Bergman, R. N., Ghosh, S., Hauser, E. R., Eriksson, J., … others. (1998). Mapping Genes
for NIDDM: Design of the Finland—United States Investigation of NIDDM Genetics (FUSION) Study.
Diabetes Care, 21(6), 949–958. Retrieved from http://care.diabetesjournals.org/content/21/6/949.short
Voight, B. F., Kang, H. M., Ding, J., Palmer, C. D., Sidore, C., Chines, P. S., … Boehnke, M. (2012). The
Metabochip, a Custom Genotyping Array for Genetic Studies of Metabolic, Cardiovascular, and Anthropometric
Traits. PLoS Genetics, 8(8), e1002793. doi:10.1371/journal.pgen.1002793
WHO | Diabetes programme. (n.d.). Retrieved 6 September 2014, from http://www.who.int/diabetes/en/
Yen, C.-J., Beamer, B. A., Negri, C., Silver, K., Brown, K. A., Yarnall, D. P., … Shuldiner, A. R. (1997). Molecular
Biochemical and Biophysical Research
Communications, 241(2), 270–274. doi:10.1006/bbrc.1997.7798
Zeggini, E., Scott, L. J., Saxena, R., Voight, B. F., Marchini, J. L., Hu, T., … Altshuler, D. (2008). Meta-analysis of
genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2
diabetes. Nature Genetics, 40(5), 638–645. doi:10.1038/ng.120
Zeggini, E., Weedon, M. N., Lindgren, C. M., Frayling, T. M., Elliott, K. S., Lango, H., … Hattersley, A. T. (2007).
Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes.
Science, 316(5829), 1336–1341. doi:10.1126/science.1142364
169
170
Summaries
171
172
SUMMARY Type 2 diabetes (T2D) affects over 340 million people worldwide. T2D predominantly affects low- and middle-income countries accounting for more than 80% of the deaths due to diabetes. The world prevalence of T2D is over 8% and costing over US$612 billion annually. Nearly 50% of people living with diabetes are undiagnosed. Identifying the causes contributing to risk for type 2 diabetes (T2D) has been a formidable challenge for decades. Evidence for genetic factors in T2D risk includes the observation of a 3.5-fold increased incidence for first degree relatives of T2D subjects compared to the general middle–aged population. In the Finnish population, where our studies have primarily been focused, the T2D concordance in monozygotic twins is ~34% compared to ~16% in dizygotic twins, supporting a significant hereditary contribution. Complicating the identification of T2D-associated genetic variants are lifestyle and environmental factors that play a major role in disease onset and progression. Poor diet and lack of exercise can contribute significantly to susceptibility to T2D. Thus T2D is an excellent example of a common complex polygenic disease. The FUSION (Finnish US Investigation of NIDDM (Non-Insulin Dependent Diabetes Mellitus)) genetics study is an international collaboration with the goal of identifying genetic variants contributing for T2D susceptibility. Families were originally selected based on index cases with age of onset 35-60 years, and with at least one affected sibling. Unaffected spouses and offspring were also ascertained for frequently sampled intravenous glucose tolerance tests (FSIGTs) to allow estimates of glucose- and insulin-related physiological traits. In addition, a cohort of elderly individuals over 65 years of age with normal glucose tolerance was collected as control subjects. This thesis focuses on the identification of the genetic basis of T2D by scanning individual genes as well as complete genomes for variations in genes, termed single nucleotide polymorphisms (SNPs), that are associated with genetic loci that may predispose to T2D. The first approach we applied was candidate gene association analysis. In these analyses, plausible genes are selected by specific criteria that imply these genes may play a role in the T2D disease process, such as, involvement in glucose regulation, pancreatic islet function, regulation of insulin action or interaction with a particular therapeutic agent. Candidate gene sequencing of the peroxisome proliferator- activated receptor-γ2 (PPARG2) gene, which regulates adipocyte differentiation and is a well know target of T2D therapeutic thiazoladinediones identified a SNP coding for the amino acid change, proline to alanine, at position 12 of the protein (P12A). Several candidate gene studies resulted in ambiguous results due to varying sample size and population characteristics affecting statistical power of association. In chapter 1 we performed candidate SNP association in the FUSION cohort, which revealed a protective effect of the P12A variant of the gene with significantly lower allele frequency in diabetics. To accelerate the ability to analyze candidate SNPs, in chapter 2, we devised a method of performing SNP association studies comparing quantitative allele frequency differences in T2D case and control DNA pools. We successfully applied this to the
173
identification of a T2D associated SNP located near the pancreatic beta-cell promoter of the hepatocyte nuclear factor-4 alpha (HNF4A) gene, a gene known to cause a rare monogenic form of diabetes, maturity-onset diabetes of the young (MODY). The second approach we employed became possible with the advent of more sophisticated technologies allowing for higher multiplex single nucleotide polymorphism (SNP) analysis enabling genome-wide association studies (GWAS) to identify novel disease associated genes. In chapter 3 we performed GWAS by evaluation of 315,635 SNPs in 1161 Finnish T2D and 1174 Finnish normal glucose tolerant (NGT) control individuals which identified multiple interesting signals but unfortunately did not achieve requisite statistical significance. However, we confirmed the known T2D association with the T-cell factor 7-like 2 (TCF7L2) gene. Recognizing the need for more statistical power we compared our results with those of two other GWAS, the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC), and followed with stage 2 analysis of 82 SNPs that showed promising evidence of association. The combined FUSION, DGI, and WTCCC results led to the identification of T2D-associated variants at four novel loci and confirmed previously associated variants near the genes TCF7L2, SLC30A8, HHEX, PPARG, and KCNJ11. In a subsequent collaboration of unprecedented size, the combined meta-analysis of these three initial GWAS studies (comprising 8,130 T2D cases and 32,987 controls), with five additional GWAS studies contributing an additional 34,412 T2D cases and 59,925 controls, identified 12 novel T2D-associated loci. In chapter 4 we also applied GWAS analysis to T2D related physiological data collected from Finnish and Sardinian non-diabetic individuals analyzed as quantitative traits and found significant association within the G6PC2/ABCB11 locus for fasting glucose. Meta-analysis of all known fasting glucose GWAS determined the association responsible for an increase in fasting glucose from 0.01-0.16 mM with each copy of the major allele, which accounts for approximately 1% of the total variation in fasting glucose. Discriminating the causal effect at this locus is difficult due to the high linkage disequilibrium between G6PC2 and ABCB11. Arguments can be made for both genes as G6PC2, a glucose-6-phosphatase, is almost exclusively expressed in pancreatic islets while ABCB11, an ATP binding cassette family member is expressed in the liver where it may also contribute to variation in glucose regulation. Similarly, in chapter 5, we combined fasting glucose of ten GWAS of individuals of European descent to discover the first T2D trait association in the melatonin receptor 1B (MTNR1B) gene that was also found to be significantly associated with T2D. In this case we were able to demonstrate functional relevance where MTNR1B was found to be expressed in pancreatic islets with increased expression in islets that were homozygous for the risk allele. These differences in expression were more profound in islets of people >45 years of age. In addition, melatonin suppressed insulin secretion in in vitro cell culture studies. Our third strategy to determine the specific genes with biologically relevant function in T2D associated loci applied the analysis of chromatin structure as a surrogate model of gene regulation. In chapter 6, using primary human pancreatic islets isolated from transplant candidates we performed DNase hypersensitive site analysis (DNaseHS), as well as CTCF binding and histone H3 modification analysis by chromatin immunoprecipitation (ChIP) to
174
identify ~18,000 putative promoters, some of which are unannotated and active only in pancreatic islet cells, and 34,039 nonpromoter regulatory elements, of which 47% are unique to pancreatic islets. We examined the chromatin structural characteristics of the 18 T2D associated loci identified in the meta-analysis of the combined GWAS and identified 118 putative regulatory elements and confirmed in vitro gene enhancer activity in a subset of these elements. In the last chapter we discuss efforts to increase power to detect T2D association, application of sequencing to identify rare variant association, and the importance of functional assays to validate genes at these T2D associated loci. We introduce our integrative analyses of T2D and related quantitative trait association, genome structure annotation and gene expression in muscle and adipose tissue biopsies from 324 additional Finnish T2D and non-diabetic subjects collected in our studies to identify new potential genes and pathways for targeted therapeutic development leading to specific treatments for this heterogeneous complex disease.
175
SAMENVATTING Type 2 diabetes (T2D) komt wereldwijd voor bij 340 miljoen mensen. Landen met een laag of gemiddeld inkomen dragen voor meer dan 80% bij aan de sterfte veroorzaakt door T2D. Wereldwijd lijdt 8% van de bevolking aan T2D en dit kost de gemeenschap 612 miljard Amerikaanse dollars per jaar. Ondanks ruim 25 jaar onderzoek aan T2D is het nog steeds een enorme uitdaging om er achter te komen wat de oorzaken zijn die leiden tot een verhoogd risico op het krijgen van deze ziekte. Het is inmiddels duidelijk dat genetische factoren een rol spelen omdat T2D 3½ keer meer voorkomt bij mensen met een eerstegraads verwant met T2D. Onze genetische studies hebben zich vooral gericht op de Finse bevolking. Daar zien we dat 34% van de monozygote tweelingen concordant zijn voor het voorkomen van T2D, terwijl dit bij slechts 16% van de dizygote tweelingen het geval is. Dit wijst op een duidelijke erfelijke aanleg voor de ontwikkeling van T2D. Echter, naast de genetische aanleg is er bij het ontstaan en progressie van T2D een grote rol voor leefstijl en omgevingsfactoren. Een slecht voedingspatroon en te weinig beweging draagt in hoge mate bij tot de kans T2D te ontwikkelen. Daarmee is T2D een uitstekend voorbeeld van een veel voorkomende complexe aandoening waarbij meerdere genen betrokken zijn. De FUSION (Finnish US Investigation of NIDDM (Non-Insulin Dependent Diabetes Mellitus)) studie is een internationaal samenwerkingsverband met als doel het ophelderen van de genetische varianten die bijdragen tot een verhoogde gevoeligheid voor T2D. De families werden in eerste instantie geselecteerd op de aanwezigheid van een patiënt die tussen de 35 en 60 jaar werd gediagnosticeerd met T2D en die ten minste 1 aangedane eerstegraad verwant heeft. Niet aangedane verwanten en kinderen werden gediagnosticeerd met behulp van een glucose tolerantie test om te beoordelen of er sprake was van een glucose of insuline gerelateerde aandoening. Als controle populatie is er een cohort van gezonde personen van 65 jaar en ouder gebruikt. Dit proefschrift focust op de ontrafeling van de genetische factoren van T2D door zowel de genetische variatie bij individuele genen als ook complete genomen te bestuderen. Het doel is om “single nucleotide polymorphisms” (SNPs) te vinden, die geassocieerd zijn met de genetische loci die T2D veroorzaken. De eerste benadering die we hebben toegepast is de associatie studie met kandidaat genen. De kandidaat genen hebben we gekozen op basis van ons huidige mechanistische inzicht in het ziekte proces, waaronder genen betrokken bij de glucose huishouding, Bètacel functie in de pancreas, regulatie van de insuline werking, en genen die interactie vertonen met sommige geneesmiddelen. Een van de kandidaat genen die we hebben gesequenced was het gen dat codeert voor “peroxisome-proliferator-activated receptor-gamma 2 (PPARG2). PPARG2 reguleert de differentiatie van vetcellen en wordt door thiazoladinediones geactiveerd als therapie voor T2D. Met behulp van DNA sequentie analyse is in dit gen een SNP gevonden die leidt tot een aminozuur verandering van proline naar alanine op aminozuur positie 12 (P12A). Verschillende andere kandidaat genen lieten wisselende resultaten zien. Dit werd veroorzaakt door verschillen in populatie grootte en eigenschappen, waardoor de statistische
176
berekeningen niet reproduceerbaar bleken. In hoofdstuk 1 worden de resultaten van de P12A variant beschreven. De P12A variant beschermt tegen het voorkomen van T2D en dragers van de P12A variant komen minder vaak voor in de T2D populatie. In hoofdstuk 2 wordt een aanpak beschreven om op een voordelige manier grote aantallen kandidaat SNPs te bestuderen. Daarbij zijn pools gemaakt van het DNA van T2D patiënten en van gezonde controle personen. Op deze manier is een met T2D geassocieerde SNP gevonden in de promoter van het gen voor “hepatocyte-nulcear factor-4 alpha (HNF4A). Eerder was al beschreven dat mutaties in het HNF4A gen een zeldzame monogene vorm van diabetes veroorzaken, namelijk “maturity-onset diabetes of the young (MODY). De tweede benadering werd mogelijk met de komst van meer geavanceerde technologieën voor de analyse van veel meer SNPs tegelijkertijd. In hoofdstuk 3 wordt een genoom-wijde associatie studie (GWAS) uitgevoerd, waarbij tegelijkertijd 315.635 SNPs worden geanalyseerd in 1161 Finse T2D patiënten en 1174 Finse gezonde controle individuen met een normaal glucose metabolisme. Met deze studie werden verschillende interessante associaties van SNPs met T2D gevonden, maar deze associaties waren niet statistisch significant. Wel kon de associatie tussen T-cell factor 7-like 2 (TCF7L2) en T2D worden bevestigd. Omdat meer statistische power noodzakelijk was, hebben wij vervolgens onze resultaten vergeleken met 2 andere GWAS studies, namelijk de “Diabetes Genetics Initiative” (DGI) en de “Wellcome trust Case Control Consortium” (WTCCC) studie. Hierop volgde een 2de analyse met 82 SNPs die mogelijk geassocieerd waren met T2D. De gecombineerde FUSION, DGI en WTCCC resultaten hebben geleid tot de identificatie van SNPs op nog eens vier verschillende loci en bevestigde de associatie van TCF7L2, SLC30A8, HEX, PPARG en KCNJ11. Vervolgens kwam een samenwerking van een ongeëvenaarde omvang tot stand, waarbij de gecombineerde meta-analyse van de 3 GWAS studies werd uitgevoerd. Dit betrof een studie met 8.130 T2D patiënten en 32.987 controles. Aan deze studie werden uiteindelijk nog 5 extra studies toegevoegd, zodat de complete studie kon worden gedaan met 34.412 T2D patiënten en 59.925 controles. Deze grootschalige studie leverde maar liefst 12 nieuwe T2D geassocieerde loci op (Hoofdstuk 3). In hoofdstuk 4 hebben we eveneens een GWAS analyse toegepast op T2D gerelateerde kenmerken van Finse en Sardische (Italië) gezonde personen. Hierbij vonden wij een significante associatie tussen het G6PC2/ABCB11 locus en gevaste bloed glucose waarden. Een meta-analyse van alle op dat moment bekende GWAS studies met dit locus liet zien dat het allel verantwoordelijk is voor de verhoging van de glucose waarde van 0.01 – 0.16 mM. Daarbij is een kopie van het major allel verantwoordelijk voor ongeveer 1% van de totale variatie in de gevaste glucose waarde. Het is in dit geval niet mogelijk om te voorspellen of het verantwoordelijke gen G6PC2 is of ABCB11 omdat deze beide genen een sterke mate van linkage disequilibrium vertonen. Beide genen zouden qua functie een rol kunnen spelen bij dit fenotype. G6PC2 is een glucose-6-fosfatase, en komt vrijwel uitsluitend tot expressie in de bètacel in de pancreas. ABCB11 behoort tot de familie van ATP binding cassette genen en komt tot expressie in de lever waar het een mogelijke rol kan hebben bij de variatie in glucose regulatie. Tevens, in hoofdstuk 5, hebben we ontdekt dat het melatonin receptor 1B (MTNR1B) gen geassocieerd is met T2D. Hierbij werd gebruik gemaakt van 10 Europese GWAS studies en gekeken naar de gevaste glucose waarden. Het MTNR1B gen is ook functioneel relevant. Het gen komt tot expressie in de eilandcellen in de
177
pancreas, waarbij de expressie van MTNR1B bij homozygoten voor het risicoallel verhoogd was. Deze verschillen in gen expressie waren meer geprononceerd in mensen van 45 jaar en ouder. Daarnaast hebben we gevonden dat melatonine de insuline secretie onderdrukt in in vitro studies. Onze derde benadering om specifieke T2D genen te vinden in de geassocieerde loci richtte zich op het analyseren van de chromatine structuur als afgeleide van de gene regulatie. In hoofdstuk 6 hebben wij “DNAse hypersensitive site analysis” (DNaseHS) en “chromatin immunoprecipitation” (ChIP) analyse toegepast en 18.000 promoters in humane eilandcellen kunnen identificeren. Een aantal van deze promoters zijn voor het eerst beschreven en zijn alleen actief in eiland cellen. Daarnaast hebben we 34.039 regulatoire elementen gevonden die geen onderdeel uitmaken van een promoter. Ongeveer 47% van deze elementen komen uniek voor in eilandcellen. Met deze informatie werden 18 T2D geassocieerde loci onderzocht. Ten minste 118 van de regulatoire elementen liggen in de T2D loci en een deel van de regulatoire elementen vertonen enhancer activiteit bij in vitro studies. In het laatste hoofdstuk wordt bediscussieerd hoe de genetische associatie studies voor T2D beter kunnen worden uitgevoerd. Ook bespreek ik het gebruik van sequencing om zeldzame varianten die associëren met T2D op de sporen. Daarnaast ga ik in op het belang van functioneel onderzoek om de genen te valideren die genoemd worden in de associatie studies. Ten slotte introduceer ik het voorstel voor een meer integratieve benadering. Die is inmiddels al ingezet door van 324 Finse T2D en gezonde personen spierweefsel en vetweefsel af te nemen. Met dit materiaal is een integratieve benadering mogelijk, door zowel de genetica, de genoom structuur en de gen expressie te bestuderen. Van deze aanpak mag worden verwacht dat deze leidt tot inzicht in nieuwe potentiële genen, nieuw mechanistisch inzicht en uiteindelijk betere behandeling voor mensen met deze complexe en heterogene ziekte.
178
Short biography
Publications
179
180
SHORT BIOGRAPHY Michael Erdos was born in New Brunswick, New Jersey on February 10, 1956. He attended Purdue University from 1974 to 1978 majoring in chemistry and completed his Bachelor of Science degree in Biochemistry at The George Washington University in 1981. Michael continued research at The George Washington University under Dr. Allan Goldstein studying the immune regulation by circulating thymic peptides, thymosin α1 and thymosin β4.
In 1990, he joined the laboratory of Dr. Warren Leonard in the National Institute of Child Health and Human Development at the National Institutes of Health where he studied molecular biology interleukin 2 receptor signaling. In 1993 he joined Dr. Francis Collins in creating the laboratory infrastructure for the newly established National Center for Human Genome Research after which he continued research in Dr. Collins lab in the BRCA1 positional cloning effort. In 1997 Michael transitioned to complex disease genetics joining the Finnish US Investigation of NIDDM (FUSION) genetics study where he initiated single nucleotide polymorphism association studies in the newly designated National Human Genome Research Institute.
Michael is currently a Senior Staff Scientist in the Collins laboratory focusing on translational research of type 2 diabetes from association studies to functional analysis and therapeutic target identification. He also contributes to the study of Hutchinson Gilford progeria syndrome (HGPS) and the design and implementation of preclinical trials to support the identification of potential therapeutics for HGPS patients. PUBLICATIONS Naylor PH, Erdos MR, and Goldstein AL, (1984) Increased thymosin levels associated with acquired immune deficiency syndrome (AIDS). "Thymic hormones and lymphokines: Basic chemistry and clinical applications", (A.L.Goldstein, ed.) Plenum Press, N.Y., p. 69-76.
Otani H, Erdos M, and Leonard WJ. Tyrosine kinase(s) regulate apoptosis and bcl-2 expression in a growth factor-dependent cell line. J. Biol. Chem. 1993; 268(30):22733-6
Castilla LC, Couch FJ, Erdos MR, Hoskins KF, Calzone K, Garber JE, Boyd J, Lubin MB, Deshano ML, Brody LC, Collins FS, and Weber BL. Mutations in the BRCA1 gene in early-onset breast and ovarian cancer. Nat. Genet. 1994; 8: 387-91.
Eriksson M, Brown WT, Gordon LB, Glynn MW, Singer J, Scott L, Erdos MR, Robbins CM, Moses TY, Berglund P, Dutra A, Pak E, Durkin S, Csoka AB, Boehnke M, Glover TW, Collins FS. Recurrent de novo point mutations in lamin A cause Hutchinson-Gilford progeria syndrome. Nature. 2003 May 15; 423(6937):293-8.
Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007 Jun 1;316(5829):1341-5.
Parker SC, Stitzel ML, Taylor DL, Orozco JM, Erdos MR, Akiyama JA, van Bueren KL, Chines PS, Narisu N; NISC Comparative Sequencing Program, Black BL, Visel A, Pennacchio LA, Collins FS; National Institutes of Health Intramural Sequencing Center Comparative Sequencing Program Authors; NISC Comparative Sequencing Program Authors. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc Natl Acad Sci U S A. 2013 Oct 29;110(44):17921-6.
181
182