supplementary information epigenetic analysis leads to ... · 1 supplementary information...
TRANSCRIPT
1
Supplementary Information
Epigenetic Analysis Leads to Identification of HNF1B as a Subtype-Specific Susceptibility Gene for Ovarian
Cancer
2
a. b. c.
Supplementary Figure S1. mRNA expression versus DNA methylation for the TCGA serous tumors with the
following platforms for expression: (A) Affymetrix Human Exon 1.0 ST array, 528 tumors and 10 normal
fallopian tube samples; (B) Affymetrix U133A, 568 tumors and eight normal fallopian tube samples; (C) Agilent
G4502, 543 tumors and four normals. Plotted as the y-axes are log intensities for the single-color-channel arrays
(HuEx1.0 and Affymetrix) and log ratios for the two-color-channel array (Agilent); and the x-axes indicate beta
values for DNA methylation from 0 (unmethylated) to 1 (methylated). We observe with each of the individual
platform the same pattern that we observed with the median integrated data, i.e., HNF1B is generally silenced in the
tumors, half of them possibly by an epigenetic mechanism.
3
Association of rs7405776 with serous invasive.02 .25 .5 .75 1 1.5 2 5 10
Combined
WOC UK2 UCI
TOR STA SEA POL POC OVA NTH NOR NJO NHS NEC NCO MSK MDA MCC MAY LA2
HPE HOC HMO HJO
HAW GER DOV DAN BEL BAV AUS
Association of rs11651755 with clear cell invasive.02 .25 .5 .75 1 1.5 2 5 10
Combined
WOC UK2 UCI
TOR STA SEA POL POC OVA NTH NOR NJO NHS NEC NCO MSK MDA MCC MAY LA2
HPE HOC HJO
HAW GER DOV DAN BEL BAV AUS
a. b.
Supplementary Figure S2. Forest plots showing ORs across studies. (A) Forest plot of individual study odds
ratios (ORs) for the top serous-associated HNF1B SNP, rs7405776. The box indicates the OR with the size of the
box reflecting the sample size and the line indicates the 95% confidence interval (CI). The overall effect estimate
was 1·13 (95% CI 1·09-1·17, p= 3·1×10-10
). No heterogeneity of effect was observed across the studies (p=0·30).
(B) Forest plot of individual study odds ratios (ORs) for the top clear cell-associated HNF1B SNP, rs11651775. The
box indicates the OR with the size of the box reflecting the sample size and the line indicates the 95% confidence
interval (CI). The overall effect estimate was 0·77 (95% CI 0·70-0·84, p= 1·6×10-8
). No heterogeneity of effect was
observed across the studies (p=0·71).
4
R2
W
ith
rs7405776
Supplementary Figure S3. Plot of the correlation between all SNPs genotyped and imputed in the HNF1B
region and the top-associated serous ovarian cancer SNP, rs7405776. The x-axis shows the genomic coordinate
(hg17) and the y-axis shows the r2 value with rs7405776. The genome-wide significant serous SNPs are denoted by
the triangles.
5
To
p S
ero
us O
va
ria
n C
an
ce
r
SN
P
Cle
ar
Cell
Ovarian
Can
cer
SN
P
Pro
sta
te a
nd
En
do
me
tria
l
Ca
nce
r a
nd
Dia
be
tes S
NP
Dia
bete
s S
NP
Dia
bete
s S
NP
Pro
sta
te S
NP
* * * ** * * * *
*Genome-wide significantly associated Serous SNPs
Supplementary Figure S4. Linkage disequilibrium (LD) plot of the genome-wide significant serous* (n=9) and
clear cell (n=1) SNPs as well as the SNPs associated with prostate and uterine cancer and diabetes. The r2
value between the SNPs is given in each box based on the 1000 Genome Project. The highest r2 between the serous
SNPs and the clear cell SNP, rs11651755, is 0·70. The r2 values between the top serous SNP, rs7405776, and the
other eight range from 0·24 to 0·97.
6
Supplementary Figure S5. Haplotype analysis of the HNF1B region harboring the genome-wide significant
associated SNPs. The rs numbers are given across the top. Eleven haplotypes were observed with a frequency of
2% or greater. The haplotypes are color-coded based on the allele found on the most common haplotype (black).
The serous ovarian cancer single SNP P -values and odds ratios are given below the haplotypes followed by the
indication of which allele was modelled for the single SNP risk analysis. The single SNP clear cell ovarian cancer P
values and odds ratios are given below. The serous and clear cell associations with each haplotype are given to the
right of the haplotypes. The most common haplotype was set as the reference group and each haplotype odds ratio is
given relative to the most common haplotype. The 10th
and 11th
haplotypes were associated with a statistically
significant increased risk of serous ovarian cancer. Both of these haplotypes carry eight of nine genome-wide
significantly associated SNPs and haplotype 11 also carries the ninth, rs61612821. The r2 between rs61612821 and
the top associated serous SNP, rs7405776, is only 0·22 because rs61612821 only falls on one of the three haplotypes
carrying rs7405776. This haplotype structure demonstrates that the signal between these SNPs cannot be
disentangled. Haplotypes four and eight are also associated with increased risk of serous ovarian cancer, but did not
reach statistical significance and they are uncommon. Both of these haplotypes carry at least one of the genome-
wide significant serous SNPs. Haplotypes seven through 11 carry the top associated clear cell SNP and were all
associated with decreased risk of clear cell disease. Only haplotype 10 was statistically significantly associated with
risk of clear cell ovarian cancer.
7
Histology
co
un
t
0
100
200
300
400
500
sero
us
mucin
ous
en
dom
etr
oid
cle
ar
cell
HNF1B
<1%
1−50%
>50%
Histology
Fre
qu
ency
0.0
0.2
0.4
0.6
0.8
1.0
sero
us
mucin
ous
en
dom
etr
oid
cle
ar
cell
HNF1B
<1%
1−50%
>50%
a. b.
Supplementary Figure S6. HNF1B protein expression differs by histological subtypes. 1,149 invasive ovarian
tumor samples from four different sites (52 HOP, 518 MAY, 119 UKO, 460 VAN) were examined and scored for
HNF1B protein expression by immunohistochemistry. (A) Histograms comparing the distribution of different
HNF1B IHC scores (blue – 0%; green – 1~50%; red: >50%) by histological subtypes. The y-axis indicates tumor
count. (B) Bar charts comparing the frequency of the IHC categories for the histological subtypes. The y-axis is the
cumulative frequency. 90% serous do not have HNF1B expression, compared to only 20% in clear cell.
8
**
*
*
*
*
*
*
**
*
*
*
*
−0.4 −0.2 0.0 0.2 0.4 0.6
02
46
81
01
21
4
104,033 CpG
Difference in Beta Value
BH−
adju
ste
d P
va
lue
* HNF1B Probes
Supplementary Figure S7. HNF1B promoter methylation is unlikely to be a passenger event by global DNA
methylation changes. We compared the DNA methylation level at 104,033 CpG loci that are unmethylated (beta
value <0·2) in the 10 normal samples, in 254 serous tumors to 17 clear cell (Mayo panel) with two-sample t-test.
The raw p values are adjusted with the Benjamini-Hochberg method and the –log10 adjusted p values are plotted as
the y axis in the volcano plot, against mean beta value for clear cell minus mean beta value for serous, as the x axis.
The non-shaded area indicate adjusted p<0·05, absolute difference in beta value > 0·2. A subset of 1,003 is used for
Figure 3, with an even more stringent cut-off of the adjusted p value <0·005, indicated by the dashed line. The red
stars indicate the HNF1B loci. We can see that while clear cell tumors generally have far more hypermethylation,
HNF1B is one of the few genes hypermethylated in the serous subtype. This argues against the possibility that
HNF1B hypermethylation in the serous subtype is a passenger event with global hypermethylation.
9
C*/C*
C*/G
G/G
a.
b.
Supplementary Figure S8. HNF1B DNA methylation levels across the entire promoter region differ by
rs11658063 genotype. We further examined the DNA methylation level across the HNF1B gene promoter region
for different genotypes at rs11658063 (relative position indicated with a red arrow; Mayo panel). (A) Similar to
Figure 5A, with the bottom panel showing the probe locations for HumanMethylation450 and HumanMethylation27
platform. (B) A blow up of the region flanking the transcription start site that is unmethylated in the normal tissue
samples, with two CpG islands associated. Shown is a heatmap where blue indicate low methylation (Beta value=0)
and red indicate high methylation (Beta value=1) for each of the loci interrogated by HM450 at this region. The
heatmap is subdivided into six subpanels, to separate samples (rows) with the three different genotypes (star
indicates the risk allele) at rs11658063 (position indicated with a red arrow), and CpG loci (columns) as upstream
and downstream of the transcription start site. We can see that the genotypes at rs11658063 (location indicated with
an red arrow) influences overall DNA methylation level, but the influences are more pronounced for the upstream
promoter region.
10
rs3744763 p= 0.17
17−36092841 p= 0.1
rs7405776 p= 0.069
rs757210 p= 0.054
rs4239217 p= 0.0077
rs61612821 p= 0.32
rs11657964 p= 0.0031
rs7501939 p= 0.0031
rs11658063 p= 0.0026
rs3744763 p= 0.08
rs757210 p= 0.01
rs4239217 p= 0.08
rs7501939 p= 0.02
TCGA n=519
Mayo n=231
Supplementary Figure S9. Validation of the SNP-DNA methylation association with TCGA data. Only four
out of the nine serous SNPs were available on the Illumina Human1M-Duo BeadChip used in TCGA. The DNA
methylation probe cg14487292 was not available on the HumanMethylation27k platform, so cg02335804, located in
the same promoter region, was used as a surrogate. The color of each box indicate the genotypes, i.e., homozygous
major (white), heterozygous (gray) and homozygous minor (black), where the minor alleles are the risk alleles. The
p values for the Mayo data are two-sided trend p values and one-sided trend p values for the validation.
11
Supplementary Figure S10. Validation of TERT-immotalization of EEC and HNF1B overexpression upon
transfection. (A) Transduction of endometriosis epithelial cells (EECs) with lentiviral hTERT supernatants results
in an extension of in vitro lifespan. Growth curve analyses show a significant increase in lifespan is not observed in
EECs transduced with lenti-GFP. (B) Immortalized endometriosis epithelial cells (EEC16) were infected with GFP
and HNF1B-GFP viral supernatants. Confirmation of HNF1B overexpression by real-time PCR; HNF1B expression
is only detected in cells transduced with HNF1B-GFP lentiviral supernatants.
12
5 6 7 8
56
78
91
01
112
HuEx1.0R=−0.10, P=0.02
HNF1B
SP
P1
5 6 7 8 9
67
89
10
11
12
AffymetrixR=−0.06, P=0.16
HNF1B
SP
P1
−1.0 0.0 0.5 1.0 1.5 2.0
−4
−2
02
4
AgilentR=−0.13, P=0.003
HNF1B
SP
P1
a. b. c.
Supplementary Figure S11. SPP1 and HNF1B mRNA expression levels have no correlation or weak inverse
correlation in the serous tumors in TCGA data. We looked at all three expression platforms used in TCGA.
Plotted as the y axis is the log intensity (a,b) or log ratio (c) for SPP1, and the x axis that for HNF1B. The platform
information, Pearson Correlation value and p value testing for linear correlation is given in the title for each panel.
While SPP1 is a downstream target of HNF1B in EEC and clear cell ovarian cancer cell, its expression does not
seem to correlate with HNF1B expression in the serous tumors.
13
Supplementary Table S1. Distribution of cases and controls by study site.
Geographic Region Study Design All Serous Mucinous Endometrioid Clear Cell Brenner Other
Australia Ovarian Cancer Study & Australia Cancer Study (Ovarian
Cancer) (AUS)Australia Population-based/case-control 1011 949 592 40 123 64 40 90
Bavarian Ovarian Cancer Cases and Controls (BAV) Southeast Germany Population-based/case-control 143 93 56 8 13 6 1 9
Belgium Ovarian Cancer Study (BEL)Belgium, University Hospital
LeuvenHospital-based/case-control 1352 277 195 25 22 23 2 10
Diseases of the Ovary and their Evaluation (DOV)USA: 13 counties in western
Wasthington statePopulation-based/case-control 1606 990 576 27 161 75 151 0
Germany Ovarian Cancer Study (GER)
Germany: two geographical
regions in the states of
BadenWürttemberg and
Rhineland-Palatinate in
southern Germany
Population-based/case-control 413 192 96 22 21 6 1 46
Gilda Radner Familial Ovarian Cancer Registry (GRR)* USA Familial cancer/case only 0 115 76 5 19 11 3 1
Hawaii Ovarian Cancer Study (HAW) USA: Hawaii Population-based/case-control 601 266 130 27 60 35 6 8
Hannover-Jena Ovarian Cancer Study (HJO) Germany Hospital-based/case-control 274 273 142 9 26 4 38 54
Hannover-Minsk Ovarian Cancer Study (HMO) Belarus Case-control 140 144 50 7 12 1 0 74
Helsinki Ovarian Cancer Study (HOC) Helsinki, Finland Case-control 447 218 113 45 28 14 0 18
Hormones and Ovarian Cancer Prediction (HOP)Western Pennsy, Northeastern
Ohio, Western New YorkPopulation-based/case-control 1501 682 388 32 90 43 50 79
DNA-Specimen in Gynecologic Oncologic Malignancies (HSK)* Germany Case only 0 146 109 1 16 0 3 17
Hospital-based Epidemiologic Research Program at Aichi Cancer
Center (JPN)Japan: Nagoya City Case-control 81 66 32 3 7 17 4 3
Women's Cancer Research Institute - Cedars-Sinai Medical Center
(LAX)*USA: Southern California Case only 0 330 248 15 26 13 27 1
Danish Malignant Ovarian Tumor Study (MAL) Denmark Population-based/case-control 829 440 272 42 54 33 0 39
Malaysia Ovarian Cancer Study (MAS) Malaysia Hospital-based/case-control 106 106 44 17 25 12 1 7
Mayo Clinic Ovarian Cancer Case Control Study (MAY)USA: North Central
(MN, SD, ND, IL, IA, WI)Clinic-based/ case-control 753 708 515 18 97 34 0 44
Melbourne Collaborative Cohort Study (MCC) Melbourne, Australia Cohort/Nested case-control 68 64 34 7 7 6 6 4
MD Anderson Ovarian Cancer Study (MDA) USA: Texas Hospital-based/case-control 385 323 194 29 29 4 1 66
Memorial Sloan Kettering Cancer Center Gynecology Tissue Bank
(MSK)USA: New York City Case-control 697 556 450 0 25 22 0 59
North Carolina Ovarian Cancer Study (NCO)USA: Central and eastern
North Carolina (48 counties)Population-based/case-control 984 850 480 43 130 85 112 0
New England-based Case-Control Study of Ovarian Cancer (NEC)USA: New Hampshire and
Eastern MassachusettsPopulation-based/case-control 1049 697 397 44 131 97 0 28
Nurses' Health Study (NHS) USAPopluation-based/nested case-
control429 127 68 7 14 6 13 19
New Jersey Ovarian Cancer Study (NJO) USA: New Jersey (six counties) Case-control 194 190 110 7 30 23 0 19
University of Bergen Norway Study (NOR) Norway Case-control 371 237 136 15 27 13 0 46
Nijmegen Polygene Study & Nijmegen Biomedical Study (NTH) Eastern part of the Netherlands Case-control 323 263 119 34 67 21 9 13
Oregon Ovarian Cancer Registry (ORE)* Portland, Oregon Case only 0 59 41 4 4 4 0 6
Ovarian Cancer in Alberta and British Columbia Study (OVA)Alberta and British Columbia,
CanadaCase-control 810 688 370 29 114 73 12 90
Poland Ovarian Cancer Study (POC)Poland: Szczecin, Poznan,
Opole, RzeszówCase-control 417 423 200 33 39 9 61 81
NCI Ovarian Case-Control Study in Poland (POL) Poland, Warszaw and Lodz Population-based/case-control 223 236 106 17 37 10 25 41
Pelvic Mass Study (PVD)* Denmark Population-based/case-control 0 172 130 11 14 8 6 3
Royal Marsden Hospital Case Series (RMH)* UK: London Hospital based/case only 0 151 52 16 29 17 0 37
UK Studies of Epidemiology and Risk Factors in Cancer Heredity
Ovarian Cancer Study (SEA)
UK: East Anglia and West
MidlandsPopulation-based/case-control 6067 1395 581 145 231 147 9 282
Southampton Ovarian Cancer Study (SOC)* United Kingdom, Wessex regionCase only/ hospital-based
0 274 105 34 64 11 7 53
Scottish Randomised Trial in Ovarian Cancer (SRO)*
Coordinated through clinical
trials unit, Glasgow UK from
patients recruited worldwide
Case only from clinical trial 0 159 93 3 17 9 25 12
Genetic Epidemiology of Ovarian Cancer (STA)USA: Six counties in the San
Francisco Bay areaPopulation-based/case-control 404 282 174 19 38 22 1 28
Shanghai Women's Health Study (SWH) Shanghai, China Cohort/nested case-control 891 135 0 0 0 0 0 135
Familial Ovarian Tumor Study (TOR) Canada: Province of Ontatio Population-based 443 559 341 39 132 34 0 13
UC Irvine Ovarian Cancer Study (UCI)
USA: Southern California
(Orange and San-Diego, Imperial
Counties)
Population-based/case-control 425 331 198 24 58 29 2 20
UK Ovarian Cancer Population Study (UKO)United Kingdom (England,
Wales and Northern Ireland)Population-based/case-control 1123 718 357 76 116 68 55 46
UK Familial Ovarian Cancer Registry (UKR)* UK: National Case only/ Familial Register 0 48 23 3 6 2 0 14
Los Angeles County Case-Control Studies of Ovarian Cancer (USC) Los Angeles County Population-based/case-control 1370 978 614 63 124 58 26 93
Warsaw Ovarian Cancer Study (WOC)Poland: Warsaw and central
PolandCase-control 204 202 132 8 20 17 1 24
Total 26134 16111 9139 1053 2303 1186 698 1732
* Case only study. For our analyses, GRR was merged with HOP, HSK with GER, LAX with USC, ORE with DOV, PVD with MAL, and RMH, SOC, SRO, and UKR with UKO.
Site
No. of
controls
Invasive Cases
14
Supplementary Table S2. Association between the genome-wide significantly associated HNF1B and serous ovarian cancer risk in non-Whites.
AAF OR p-value AAF OR p-value AAF OR p-value
rs3744763 0·56 1·13 0·91 - 1·40 0·26 0·07 0·95 0·45 - 2·01 0·89 0·33 0·77 0·64 - 0·93 0·01
17-36092841 0·34 1·11 0·87 - 1·41 0·40 0·53 1·23 0·82 - 1·85 0·31 0·40 0·87 0·72 - 1·04 0·13
rs7405776 0·29 1·10 0·87 - 1·38 0·42 0·51 1·16 0·79 - 1·70 0·45 0·38 0·80 0·67 - 0·95 0·01
rs757210 0·29 1·07 0·85 - 1·34 0·58 0·53 1·05 0·72 - 1·54 0·79 0·37 0·85 0·71 - 1·01 0·06
rs4239217 0·30 1·10 0·88 - 1·38 0·41 0·27 1·10 0·74 - 1·62 0·64 0·33 0·80 0·66 - 0·96 0·02
rs11651755 0·28 0·98 0·78 - 1·24 0·87 0·66 0·89 0·61 - 1·29 0·53 0·46 0·89 0·75 - 1·06 0·19
rs61612821 0·08 1·17 0·76 - 1·78 0·47 0·02 1·28 0·26 - 6·32 0·76 0·08 0·73 0·49 - 1·08 0·11
rs11657964 0·27 1·03 0·81 - 1·30 0·83 0·52 0·93 0·65 - 1·32 0·68 0·37 0·83 0·69 - 0·99 0·04
rs7501939 0·27 1·02 0·81 - 1·29 0·87 0·50 0·93 0·65 - 1·33 0·69 0·37 0·85 0·71 - 1·02 0·07
rs11658063 0·28 1·01 0·80 - 1·28 0·94 0·39 0·84 0·56 - 1·28 0·42 0·35 0·79 0·65 - 0·95 0·01
AAF=Alternate Allele Frequency* cases / controls
Asians (n=249 / 1573*) Africans (n=89 / 200
*) Other (n=431 / 870
*)
95% CI 95% CI 95% CI
15
Supplementary Note 1
PRACTICAL Consortium
Access to genotype data for SNPs that were not nominated by OCAC was provided by the PRACTICAL
Consortium investigators including: Doug Easton, Centre for Cancer Genetic Epidemiology, Department of Public
Health and Primary Care, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK;
Rosalind Eeles, The Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK and Royal
Marsden NHS Foundation Trust, Fulham and Sutton, London and Surrey, UK; Kenneth Muir, University of
Warwick, Coventry, UK; Graham Giles, Cancer Epidemiology Centre, The Cancer Council Victoria, 1 Rathdowne
street, Carlton Victoria, Australia and Centre for Molecular, Environmental, Genetic and Analytic Epidemiology,
The University of Melbourne, 723 Swanston street, Carlton, Victoria, Australia; Fredrik Wiklund, Department of
Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Henrik Gronberg, Department of
Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Christopher Haiman,
Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris
Comprehensive Cancer Center, Los Angeles, California, USA.; Johanna Schleutker, Department of Medical
Biochemistry and Genetics, University of Turku, Turku, Finland and Institute of Biomedical
Technology/BioMediTech, University of Tampere and FimLab Laboratories, Tampere, Finland. ; Maren Weischer,
Department of Clinical Biochemistry, Herlev Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-
2730 Herlev, Denmark; Ruth Travis, Cancer Epidemiology Unit, Nuffield Department of Clinical Medicine,
University of Oxford, Oxford, UK; David Neal, Surgical Oncology (Uro-Oncology: S4), University of Cambridge,
Box 279, Addenbrooke’s Hospital, Hills Road, Cambridge, UK and Cancer Research UK Cambridge Research
Institute, Li Ka Shing Centre, Cambridge, UK; Paul Pharoah, Centre for Cancer Genetic Epidemiology, Department
of Oncology, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK; Kay-Tee
Khaw, Cambridge Institute of Public Health, University of Cambridge, Forvie Site, Robinson Way, Cambridge CB2
0SR; Janet L. Stanford, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle,
Washington, USA and Department of Epidemiology, School of Public Health, University of Washington, Seattle,
Washington, USA; William J. Blot, International Epidemiology Institute, 1455 Research Blvd., Suite 550,
Rockville, MD 20850; Stephen Thibodeau, Mayo Clinic, Rochester, Minnesota, USA; Christiane Maier, Department
of Urology, University Hospital Ulm, Germany and Institute of Human Genetics University Hospital Ulm,
Germany; Adam S. Kibel, Brigham and Women's Hospital/Dana-Farber Cancer Institute, 45 Francis Street- ASB II-
3, Boston, MA 02115 and Washington University, St Louis, Missouri; Cezary Cybulski, International Hereditary
Cancer Center, Department of Genetics and Pathology, Pomeranian Medical University, Szczecin, Poland; Lisa
Cannon-Albright, Division of Genetic Epidemiology, Department of Medicine, University of Utah School of
Medicine.; Hermann Brenner, Division of Clinical Epidemiology and Aging Research, German Cancer Research
Center, Heidelberg Germany ; Jong Park , Division of Cancer Prevention and Control, H. Lee Moffitt Cancer
Center, 12902 Magnolia Dr., Tampa, Florida, USA; Radka Kaneva, Molecular Medicine Center and Department of
Medical Chemistry and Biochemistry, Medical University - Sofia, 2 Zdrave St, 1431, Sofia, Bulgaria; Jyotnsa Batra,
Australian Prostate Cancer Research Centre-Qld, Institute of Health and Biomedical Innovation and Schools of Life
Science and Public Health, Queensland University of Technology, Brisbane, Australia; Manuel R. Teixeira,
Department of Genetics, Portuguese Oncology Institute, Porto, Portugal and Biomedical Sciences Institute (ICBAS),
Porto University, Porto, Portugal; Maya Ghoussaini, Centre for Cancer Genetic Epidemiology, Department of Public
Health and Primary Care, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK;
Zsofia Kote-Jarai, The Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK; Ali Amin
Al Olama, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of
Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK; Sara Benlloch, Centre for Cancer Genetic
Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Strangeways Laboratory,
Worts Causeway, Cambridge, UK.
16
Supplementary Note 2
Australian Ovarian Cancer Study Group
Members of the Australian Ovarian Cancer Study (AOCS) are listed below:
D. Bowtell, D. Gertig, A. Green, A. DeFazio, P. Webb, R Stuart-Harris; NSW- F Kirsten, J Rutovitz, P Clingan, A
Glasgow, A Proietto, S Braye, G Otton, J Shannon, T Bonaventura, J Stewart, S Begbie, M Friedlander, D Bell, S
Baron-Hay, A Ferrier (dec.), G Gard, D Nevell, N Pavlakis, S Valmadre, B Young, C Camaris, R Crouch, L
Edwards, N Hacker, D Marsden, G Robertson, P Beale, J Beith, J Carter, C Dalrymple, R Houghton, P Russell, L
Anderson, M Links, J Grygiel, J Hill, A Brand, K Byth, R Jaworski, P Harnett, R Sharma, G Wain; QLD-D Purdie,
D Whiteman, B Ward, D Papadimos, A Crandon, M Cummings, K Horwood, A Obermair, L Perrin, D Wyld, J
Nicklin; SA- M Davy, MK Oehler, C Hall, T Dodd, T Healy, K Pittman, D Henderson, J Miller, J Pierdes, A Achan;
TAS-P Blomfield, D Challis, R McIntosh, A Parker; VIC- B Brown, R Rome, D Allen, P Grant, S Hyde, R Laurie,
M Robbie, D Healy, T Jobling, T Manolitsas, J McNealage, P Rogers, B Susil, E Sumithran, I Simpson, I Haviv, K
Phillips, D Rischin, S Fox, D Johnson, S Lade, P Waring, M Loughrey, N O’Callaghan, B Murray, L Mileshkin, P
Allan; V Billson, J Pyman, D Neesham, M Quinn, A Hamilton, C Underhill, R Bell, LF Ng, R Blum, V Ganju; WA-
I Hammond, A McCartney (dec.), C Stewart, Y Leung, M Buck, N Zeps (WARTN)
Supplementary Note 3
Australian Cancer Study
Investigators: David C. Whiteman MBBS, PhD, Penelope M. Webb MA, D Phil, Adele C. Green MBBS, PhD,
Nicholas K. Hayward PhD, Peter G. Parsons PhD, David M. Purdie PhD; Clinical collaborators: B. Mark Smithers
FRACS, David Gotley FRACS PhD, Andrew Clouston FRACP PhD, Ian Brown FRACP; Project Manager:
Suzanne Moore RN, MPH; Database: Karen Harrap BIT, Troy Sadkowski BIT; Research Nurses: Suzanne O’Brien
RN MPH, Ellen Minehan RN, Deborah Roffe RN, Sue O’Keefe RN, Suzanne Lipshut RN, Gabby Connor RN,
Hayley Berry RN, Frances Walker RN, Teresa Barnes RN, Janine Thomas RN, Linda Terry RN MPH, Michael
Connard B Sc, Leanne Bowes B Sc, MaryRose Malt RN, Jo White RN; Clinical Contributors: Australian Capital
Territory: Charles Mosse FRACS, Noel Tait FRACS; New South Wales: Chris Bambach FRACS, Andrew Biankan
FRACS, Roy Brancatisano FRACS, Max Coleman FRACS, Michael Cox FRACS, Stephen Deane FRACS, Gregory
L. Falk FRACS, James Gallagher FRACS, Mike Hollands FRACS, Tom Hugh FRACS, David Hunt FRACS, John
Jorgensen FRACS, Christopher Martin FRACS, Mark Richardson FRACS, Garrett Smith FRACS, Ross
Smith FRACS, David Storey FRACS; Queensland: John Avramovic FRACS, John Croese FRACP, Justin D'Arcy
FRACS, Stephen Fairley FRACP, John Hansen FRACS, John Masson FRACP, Les Nathanson FRACS, Barry
O'Loughlin FRACS, Leigh Rutherford FRACS, Richard Turner FRACS, Morgan Windsor FRACS; South Australia:
Justin Bessell FRACS, Peter Devitt FRACS, Glyn Jamieson FRACS, David Watson FRACS; Victoria: Stephen
Blamey FRACS, Alex Boussioutas FRACP, Richard Cade FRACS, Gary Crosthwaite FRACS, Ian Faragher
FRACS, John Gribbin FRACS, Geoff Hebbard FRACP, George Kiroff FRACS, Bruce Mann FRACS, Bob Millar
FRACS, Paul O'Brien FRACS, Robert Thomas FRACS, Simon Wood FRACS; Western Australia: Steve Archer
FRACS, Kingsley Faulkner FRACS, Jeff Hamdorf FRACS
17
Supplementary Methods
Selection of SNPs
Tagging SNPs (tSNPs) were selected in the HNF1B region using the program SNAGGER34
from the International
HapMap Project CEU population (White) in order to cover all SNPs in the region with a minor allele frequency of
0·05 with an r2 of 0·80. This resulted in the selection of 40 SNPs. In addition, because of the association between
prostate cancer and HNF1B, an additional 134 SNPs were selected by The Prostate Cancer Association Group to
Investigate Cancer Associated Alterations in the Genome (The PRACTICAL Consortium35-37
) to provide full fine-
mapping information based on 174 genotyped SNPs. A 150kb-region surrounding HNF1B was identified for fine-
mapping (hg18 coordinates 33,100,000-33,250,000). Fine-mapping SNPs were selected at this locus from the March
2010 (Build 36) release of the 1000 Genomes Project for all known SNPs with minor allele frequency >0·02 in
Europeans and r2>0·1 with the reported prostate cancer associated SNPs (s11649743 and rs4430796).
IMPUTE provides estimated allele dosage for SNPs that were not genotyped and for samples with missing genotype
data for genotyped SNPs.
SNP Genotyping
Each 96-well plate contained 250 ng genomic DNA (or 500 ng whole-genome amplified DNA). Raw intensity data
files for all consortia were sent to the COGS data co-ordination centre at the University of Cambridge for centralized
genotype calling and QC.
Initial calling used a cluster file generated using 270 samples from Hapmap2. These calls were used for ongoing
QC checks during the genotyping. To generate the final calls used for the data analysis, we first selected a subset of
3,018 individuals, including samples from each of the genotyping centers, each of the participating consortia, and
each major ethnicity. Only plates with a consistent high call rate in the initial calling were used. The HapMap
samples and ~160 samples that were known positive controls for rare variants on the array were used to generate a
cluster file that was then applied to call the genotypes for the remaining samples. We also investigated two other
calling algorithms: Illumnus38
and GenoSNP39
, but manual inspection of a sample of SNPs with discrepant calls
indicated that GenCall was invariably superior.
Sample QC for Genotyping
One thousand two hundred and seventy three OCAC samples were genotyped in duplicate. Genotypes were
discordant for greater than 40 percent of SNPs for 22 pairs. For the remaining 1,251 pairs, concordance was greater
than 99·6 percent. In addition we identified 245 pairs of samples that were unexpected genotypic duplicates. Of
these, 137 were phenotypic duplicates and judged to be from the same individual. We used identity-by-state to
identify 618 pairs of first-degree relatives. Samples were excluded according to the following criteria: 1) 1,133
samples with a conversion rate of less than 95 percent; 2) 169 samples with heterozygosity >5 standard deviations
from the intercontinental ancestry specific mean heterozygosity; 3) 65 samples with ambiguous sex; 4) 269 samples
with the lowest call rate from a first-degree relative pair 5) 1,686 samples that were either duplicate samples that
were non-concordant for genotype or genotypic duplicates that were not concordant for phenotype. Thus, a total of
44,308 subjects including 16,111 invasive cases, 2,063 borderline cases and 26,134 controls were available for
analysis.
SNP Quality Control
In total, 211,155 SNP assays, identified across a number of studies, were successfully designed and included on the
array. SNPs were excluded according to the following criteria: (1) 1,311 SNPs without a genotype call; (2) 2,857
monomorphic SNPs; (3) 5,201 SNPs with a call rate less than 95 percent and MAF > 0·05 or call rate less than 99
percent with MAF < 0·05; (4) 2,194 SNPs showing evidence of deviation of genotype frequencies from Hardy-
Weinberg equilibrium (P<10-7
); (5) 22 SNPS with greater than two percent discordance in duplicate pairs. Overall,
94·5 percent passed QC. Genotype clusters were visually inspected for the most strongly associated SNPs.
Statistical Analysis
18
Subjects with greater than 90 percent European ancestry were classified as European (n=39,944) and those with
greater than 80 percent Asian and African ancestry were classified as being Asian (n=2,388) and African
respectively (n=387). All other subjects were classified as mixed ancestry (n=1,770). We then used a set of 37,000
additional genotyped markers not suspected to be related to ovarian cancer risk to perform principal components
analysis within each major population subgroup40
. To enable this analysis on very large-scale samples we used an
in-house program written in C++ using the Intel MKL libraries for eigenvectors (available at
http://ccge.medschl.cam.ac.uk/software/).
For the non-European groups for all invasive cases and serous cases as well as for all groups for the other subtypes,
we were not able to carry out within study analyses due to the small sample sizes available. We thus conducted
unconditional logistic regression models adjusted for the first five principal components for the European ancestry
and the first two principal components for the other ancestry groups as well as study site.
To evaluate the independence of associations between the top serous and clear cell SNPs, we fit separate models by
histology that contained both SNPs. In addition, two correlated SNPs were found to be associated with both serous
and clear cell subtypes of ovarian cancer, with one SNP being more strongly associated with serous (rs7405776) and
the other more strongly associated with clear cell (rs11651755). It is conceivable that the associations for both sub-
types are being driven by the same SNP, but, by chance, the other correlated SNP is giving a stronger signal for one
of the sub-types. We therefore compared the log-likelihood statistics logistic regression models for each SNP with
each subtype. The odds in favor of one SNP being the driver of the signal is given as exp(log-likelihoodSNP1 - log-
likelihoodSNP2).
The region for haplotype analysis was defined as extending to the point around the top serous SNP, rs7405776,
where there were no SNPs with an r2>0·20 with a minor allele frequency of 5%.
TCGA Packages Used
Affymetrix HT Human Genome U133 Array Plate Set
broad.mit.edu_OV.HT_HG-U133A.Level_3.11.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.12.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.13.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.14.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.15.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.17.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.18.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.19.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.21.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.22.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.24.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.27.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.40.1007.0/
broad.mit.edu_OV.HT_HG-U133A.Level_3.9.1007.0/
Agilent 244K Custom Gene Expression G4502A-07-3
unc.edu_OV.AgilentG4502A_07_3.Level_3.1.5.0/
unc.edu_OV.AgilentG4502A_07_3.Level_3.2.0.0/
Affymetrix Human Exon 1.0 ST Array
19
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.11.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.12.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.13.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.14.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.15.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.17.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.18.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.19.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.21.2.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.22.1.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.27.0.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.40.1.0/
lbl.gov_OV.HuEx-1_0-st-v2.Level_3.9.2.0/
Illumina Infinium HumanMethylation27 Beadchip
jhu-usc.edu_OV.HumanMethylation27.Level_3.1.4.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.10.0.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.11.1.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.12.1.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.13.0.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.2.4.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.3.4.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.4.3.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.5.2.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.6.2.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.7.2.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.8.2.0/
jhu-usc.edu_OV.HumanMethylation27.Level_3.9.0.0/
DNA methylation / mRNA Expression Correlation
The TCGA DNA methylation data were generated on the Illumina Infinium HumanMethylation27 Beatchip. A total
of 576 tumors and 14 fallopian tube samples were assayed. The TCGA mRNA data were generated on three
platforms. A total of 592 unique tumor samples were assayed, with 512 assayed on all three platforms, and 80 on
two of the three platforms. Ten normal fallopian tube samples were assayed as well, four on all three platforms and
six on two of the three. 574 of the tumor samples and all ten fallopian tube samples had matching DNA methylation
data. Scatterplots were used to examine the association between the mRNA expression data and DNA methylation
data matched with a 16 digit TCGA ID. The DNA methylation probe cg02335804 was used for HNF1B promoter
DNA methylation level, as was also used throughout the paper for samples assayed on the HumanMethylation27
platform. The evaluation was done for both integrated mRNA expression data and for each expression platform
separately.
To integrate data from the three platforms, we median-centered41
the Level 3 data (log intensity for the one color
channel platforms and log ratio for the two color channel platform) for HNF1B expression for each platform. Then
we took the median of the log ratio estimates from the three platforms as the relative HNF1B expression level for
20
each sample. Spearman correlation was used to assess the correlation between gene expression and DNA
methylation.
Quality Control of The Infinium HumanMethylation450 BeadChip Assay
The quality of the bisulfite converted DNA and the performance of the CpG probes were assessed using a CEPH
control, a whole genome amplified (WGA) negative control and placental positive control samples. Internal
placental positive control, WGA negative control, and a CEPH control were used for quality control and the mean
intra-class correlation across the two batches of samples was 0·90, 0·96 and 0·99, respectively. Intra-class
correlation for ovarian duplicate samples was > 0·99.
SNP/DNA Methylation Association Validation With TCGA Data
Validation was done with the TCGA data with 519 tumors. Four out of nine SNPs are available on the TCGA
platform. For promoter DNA methylation, cg02335804 was used as a surrogate since cg14487292 (the two CpGs
are 278bp away) was not present on the HumanMethylation27 platform. The p-values are from one-sided tests for
linear trend in the DNA methylation beta value across the three genotypes for each locus. The nominal Bonferroni
adjusted p-value cutoff would be 0·013 (0·05/4).
Genomic/Epigenomic Data Analysis and Visualization
The statistical analyses were done in R (version 2.15.0). Mapping and characterization of the HumanMethylation450
probes were done with the R package IlluminaHumanMethylation450k.db. The UCSC tracks were downloaded with
rtracklayer41
. The PRC1 (Ring1b) and PRC2 mark (H3K27me3) ChIP-seq32
and the chromatin state data
(ChromHMM)33
were from previous work in embryonic stem cells. The genomic and epigenomic data were mapped
to the genome with Build37(hg19) coordinates, and visualized using the R packages with GenomicRanges42
and
Gviz43
.
In vitro model of HNF1B overexpression
Lentivial plasmids encoding TERT (Addgene plasmid 12245), HNF1B-GFP or GFP (Genecopeia) were co-
transfected using Lipofectamine™ (Invitrogen) with pMD2.G and p8.91 plasmids into HEK293T virus producer
cells. Cells were refed the following day and virus harvested, filtered though a 0·45μm filter and snap-frozen 48
hours later. Viral titres were analyzed and target cells tranduced overnight in the presence of 8 μg/ml polybrene
(Sigma).
An immortalized endometrosis epithelial cell (EEC) line was generated by lentiviral transduction of hTERT into
primary EECs. Extended in vitro lifespan was confirmed by growth curve analysis (Figure S10). TERT immortalized
EECs were transduced with lentiviral HNF1B-GFP or GFP supernatants and positive cells selected with 400ng/ml
puromycin (Sigma). GFP expression was confirmed by fluorescent microscopy; HNF1B was confirmed by real-time
PCR (Supplementary Figure S10).
Gene expression analysis
RNA was harvested from cells using the QIAgen RNeasy kit with on-column DNase I digestion. 1μg RNA was
reverse transcribed using an MMLV reverse transcriptase enzyme (Promega). Gene expression analyses were
performed using TaqMan PCR probes (HNF1B, Hs01001602_m1; DPP4, Hs00175210; ACE 2, Hs01085333_m1,
SPP1, Hs00959010_m1; β-actin, Hs00357333_g1; GAPDH, Hs02758991_g1; Applied Biosystems) and analyzed
using the ABI 7900HT FAST Real-Time PCR system. Relative expression of each gene of interest was calculated
using the delta-delta Ct method; Ct values for each gene were normalized to mean Ct values for β-actin and
GAPDH. Statistical analyses were performed using Prism software. Two-tailed paired t-tests with significance
cutoffs of 0·05 were used.
21
References
34. Edlund CK, Lee WH, Li D, Van Den Berg DJ, Conti DV. Snagger: a user-friendly program for incorporating
additional information for tagSNP selection. BMC Bioinformatics 2008;9:174.
35. Kote-Jarai Z, Easton DF, Stanford JL, Ostrander EA, Schleutker J, Ingles SA, et al. Multiple novel prostate
cancer predisposition loci confirmed by an international study: the PRACTICAL Consortium. Cancer
Epidemiol Biomarkers Prev 2008;17:2052-61.
36. Kote-Jarai Z, Olama AA, Giles GG, Severi G, Schleutker J, Weischer M, et al. Seven prostate cancer
susceptibility loci identified by a multi-stage genome-wide association study. Nat Genet 2011;43:785-91.
37. Eeles RA, Kote-Jarai Z, Al Olama AA, Giles GG, Guy M, Severi G, et al. Identification of seven new prostate
cancer susceptibility loci through a genome-wide association study. Nat Genet 2009;41:1116-21.
38. Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, et al. A genotype calling algorithm
for the Illumina BeadArray platform. Bioinformatics 2007;23:2741-6.
39. Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC. GenoSNP: a variational Bayes within-sample SNP
genotyping algorithm that does not require a reference population. Bioinformatics 2008;24:2209-14.
40. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis
corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904-9.
41. Lawrence M, Gentleman R, Carey, V. rtracklayer: an R package for interfacing with genome browsers.
Bioinformatics 2009;25:1841-2.
42. Aboyoun P, Pages H, Lawrence M. GenomicRanges: Representation and manipulation of genomic intervals. R
package version 1.8.3 edn.
43. Hahne F, Durinck S, Ivanek R, Mueller A. Gviz: Plotting data and annotation information along genomic
coordinates. R package version 1.0.0 edn.