supplementary information epigenetic analysis leads to ... · 1 supplementary information...

1

Supplementary Information

Epigenetic Analysis Leads to Identification of HNF1B as a Subtype-Specific Susceptibility Gene for Ovarian

Cancer

2

a. b. c.

Supplementary Figure S1. mRNA expression versus DNA methylation for the TCGA serous tumors with the

following platforms for expression: (A) Affymetrix Human Exon 1.0 ST array, 528 tumors and 10 normal

fallopian tube samples; (B) Affymetrix U133A, 568 tumors and eight normal fallopian tube samples; (C) Agilent

G4502, 543 tumors and four normals. Plotted as the y-axes are log intensities for the single-color-channel arrays

(HuEx1.0 and Affymetrix) and log ratios for the two-color-channel array (Agilent); and the x-axes indicate beta

values for DNA methylation from 0 (unmethylated) to 1 (methylated). We observe with each of the individual

platform the same pattern that we observed with the median integrated data, i.e., HNF1B is generally silenced in the

tumors, half of them possibly by an epigenetic mechanism.

3

Association of rs7405776 with serous invasive.02 .25 .5 .75 1 1.5 2 5 10

Combined

WOC UK2 UCI

TOR STA SEA POL POC OVA NTH NOR NJO NHS NEC NCO MSK MDA MCC MAY LA2

HPE HOC HMO HJO

HAW GER DOV DAN BEL BAV AUS

Association of rs11651755 with clear cell invasive.02 .25 .5 .75 1 1.5 2 5 10

Combined

WOC UK2 UCI

TOR STA SEA POL POC OVA NTH NOR NJO NHS NEC NCO MSK MDA MCC MAY LA2

HPE HOC HJO

HAW GER DOV DAN BEL BAV AUS

a. b.

Supplementary Figure S2. Forest plots showing ORs across studies. (A) Forest plot of individual study odds

ratios (ORs) for the top serous-associated HNF1B SNP, rs7405776. The box indicates the OR with the size of the

box reflecting the sample size and the line indicates the 95% confidence interval (CI). The overall effect estimate

was 1·13 (95% CI 1·09-1·17, p= 3·1×10-10

). No heterogeneity of effect was observed across the studies (p=0·30).

(B) Forest plot of individual study odds ratios (ORs) for the top clear cell-associated HNF1B SNP, rs11651775. The

box indicates the OR with the size of the box reflecting the sample size and the line indicates the 95% confidence

interval (CI). The overall effect estimate was 0·77 (95% CI 0·70-0·84, p= 1·6×10-8

). No heterogeneity of effect was

observed across the studies (p=0·71).

4

R2

W

ith

rs7405776

Supplementary Figure S3. Plot of the correlation between all SNPs genotyped and imputed in the HNF1B

region and the top-associated serous ovarian cancer SNP, rs7405776. The x-axis shows the genomic coordinate

(hg17) and the y-axis shows the r2 value with rs7405776. The genome-wide significant serous SNPs are denoted by

the triangles.

5

To

p S

ero

us O

va

ria

n C

an

ce

r

SN

P

Cle

ar

Cell

Ovarian

Can

cer

SN

P

Pro

sta

te a

nd

En

do

me

tria

l

Ca

nce

r a

nd

Dia

be

tes S

NP

Dia

bete

s S

NP

Dia

bete

s S

NP

Pro

sta

te S

NP

* * * ** * * * *

*Genome-wide significantly associated Serous SNPs

Supplementary Figure S4. Linkage disequilibrium (LD) plot of the genome-wide significant serous* (n=9) and

clear cell (n=1) SNPs as well as the SNPs associated with prostate and uterine cancer and diabetes. The r2

value between the SNPs is given in each box based on the 1000 Genome Project. The highest r2 between the serous

SNPs and the clear cell SNP, rs11651755, is 0·70. The r2 values between the top serous SNP, rs7405776, and the

other eight range from 0·24 to 0·97.

6

Supplementary Figure S5. Haplotype analysis of the HNF1B region harboring the genome-wide significant

associated SNPs. The rs numbers are given across the top. Eleven haplotypes were observed with a frequency of

2% or greater. The haplotypes are color-coded based on the allele found on the most common haplotype (black).

The serous ovarian cancer single SNP P -values and odds ratios are given below the haplotypes followed by the

indication of which allele was modelled for the single SNP risk analysis. The single SNP clear cell ovarian cancer P

values and odds ratios are given below. The serous and clear cell associations with each haplotype are given to the

right of the haplotypes. The most common haplotype was set as the reference group and each haplotype odds ratio is

given relative to the most common haplotype. The 10th

and 11th

haplotypes were associated with a statistically

significant increased risk of serous ovarian cancer. Both of these haplotypes carry eight of nine genome-wide

significantly associated SNPs and haplotype 11 also carries the ninth, rs61612821. The r2 between rs61612821 and

the top associated serous SNP, rs7405776, is only 0·22 because rs61612821 only falls on one of the three haplotypes

carrying rs7405776. This haplotype structure demonstrates that the signal between these SNPs cannot be

disentangled. Haplotypes four and eight are also associated with increased risk of serous ovarian cancer, but did not

reach statistical significance and they are uncommon. Both of these haplotypes carry at least one of the genome-

wide significant serous SNPs. Haplotypes seven through 11 carry the top associated clear cell SNP and were all

associated with decreased risk of clear cell disease. Only haplotype 10 was statistically significantly associated with

risk of clear cell ovarian cancer.

7

Histology

co

un

t

0

100

200

300

400

500

sero

us

mucin

ous

en

dom

etr

oid

cle

ar

cell

HNF1B

<1%

1−50%

>50%

Histology

Fre

qu

ency

0.0

0.2

0.4

0.6

0.8

1.0

sero

us

mucin

ous

en

dom

etr

oid

cle

ar

cell

HNF1B

<1%

1−50%

>50%

a. b.

Supplementary Figure S6. HNF1B protein expression differs by histological subtypes. 1,149 invasive ovarian

tumor samples from four different sites (52 HOP, 518 MAY, 119 UKO, 460 VAN) were examined and scored for

HNF1B protein expression by immunohistochemistry. (A) Histograms comparing the distribution of different

HNF1B IHC scores (blue – 0%; green – 1~50%; red: >50%) by histological subtypes. The y-axis indicates tumor

count. (B) Bar charts comparing the frequency of the IHC categories for the histological subtypes. The y-axis is the

cumulative frequency. 90% serous do not have HNF1B expression, compared to only 20% in clear cell.

8

**

*

*

*

*

*

*

**

*

*

*

*

−0.4 −0.2 0.0 0.2 0.4 0.6

02

46

81

01

21

4

104,033 CpG

Difference in Beta Value

BH−

adju

ste

d P

va

lue

* HNF1B Probes

Supplementary Figure S7. HNF1B promoter methylation is unlikely to be a passenger event by global DNA

methylation changes. We compared the DNA methylation level at 104,033 CpG loci that are unmethylated (beta

value <0·2) in the 10 normal samples, in 254 serous tumors to 17 clear cell (Mayo panel) with two-sample t-test.

The raw p values are adjusted with the Benjamini-Hochberg method and the –log10 adjusted p values are plotted as

the y axis in the volcano plot, against mean beta value for clear cell minus mean beta value for serous, as the x axis.

The non-shaded area indicate adjusted p<0·05, absolute difference in beta value > 0·2. A subset of 1,003 is used for

Figure 3, with an even more stringent cut-off of the adjusted p value <0·005, indicated by the dashed line. The red

stars indicate the HNF1B loci. We can see that while clear cell tumors generally have far more hypermethylation,

HNF1B is one of the few genes hypermethylated in the serous subtype. This argues against the possibility that

HNF1B hypermethylation in the serous subtype is a passenger event with global hypermethylation.

9

C*/C*

C*/G

G/G

a.

b.

Supplementary Figure S8. HNF1B DNA methylation levels across the entire promoter region differ by

rs11658063 genotype. We further examined the DNA methylation level across the HNF1B gene promoter region

for different genotypes at rs11658063 (relative position indicated with a red arrow; Mayo panel). (A) Similar to

Figure 5A, with the bottom panel showing the probe locations for HumanMethylation450 and HumanMethylation27

platform. (B) A blow up of the region flanking the transcription start site that is unmethylated in the normal tissue

samples, with two CpG islands associated. Shown is a heatmap where blue indicate low methylation (Beta value=0)

and red indicate high methylation (Beta value=1) for each of the loci interrogated by HM450 at this region. The

heatmap is subdivided into six subpanels, to separate samples (rows) with the three different genotypes (star

indicates the risk allele) at rs11658063 (position indicated with a red arrow), and CpG loci (columns) as upstream

and downstream of the transcription start site. We can see that the genotypes at rs11658063 (location indicated with

an red arrow) influences overall DNA methylation level, but the influences are more pronounced for the upstream

promoter region.

10

rs3744763 p= 0.17

17−36092841 p= 0.1

rs7405776 p= 0.069

rs757210 p= 0.054

rs4239217 p= 0.0077

rs61612821 p= 0.32

rs11657964 p= 0.0031

rs7501939 p= 0.0031

rs11658063 p= 0.0026

rs3744763 p= 0.08

rs757210 p= 0.01

rs4239217 p= 0.08

rs7501939 p= 0.02

TCGA n=519

Mayo n=231

Supplementary Figure S9. Validation of the SNP-DNA methylation association with TCGA data. Only four

out of the nine serous SNPs were available on the Illumina Human1M-Duo BeadChip used in TCGA. The DNA

methylation probe cg14487292 was not available on the HumanMethylation27k platform, so cg02335804, located in

the same promoter region, was used as a surrogate. The color of each box indicate the genotypes, i.e., homozygous

major (white), heterozygous (gray) and homozygous minor (black), where the minor alleles are the risk alleles. The

p values for the Mayo data are two-sided trend p values and one-sided trend p values for the validation.

11

Supplementary Figure S10. Validation of TERT-immotalization of EEC and HNF1B overexpression upon

transfection. (A) Transduction of endometriosis epithelial cells (EECs) with lentiviral hTERT supernatants results

in an extension of in vitro lifespan. Growth curve analyses show a significant increase in lifespan is not observed in

EECs transduced with lenti-GFP. (B) Immortalized endometriosis epithelial cells (EEC16) were infected with GFP

and HNF1B-GFP viral supernatants. Confirmation of HNF1B overexpression by real-time PCR; HNF1B expression

is only detected in cells transduced with HNF1B-GFP lentiviral supernatants.

12

5 6 7 8

56

78

91

01

112

HuEx1.0R=−0.10, P=0.02

HNF1B

SP

P1

5 6 7 8 9

67

89

10

11

12

AffymetrixR=−0.06, P=0.16

HNF1B

SP

P1

−1.0 0.0 0.5 1.0 1.5 2.0

−4

−2

02

4

AgilentR=−0.13, P=0.003

HNF1B

SP

P1

a. b. c.

Supplementary Figure S11. SPP1 and HNF1B mRNA expression levels have no correlation or weak inverse

correlation in the serous tumors in TCGA data. We looked at all three expression platforms used in TCGA.

Plotted as the y axis is the log intensity (a,b) or log ratio (c) for SPP1, and the x axis that for HNF1B. The platform

information, Pearson Correlation value and p value testing for linear correlation is given in the title for each panel.

While SPP1 is a downstream target of HNF1B in EEC and clear cell ovarian cancer cell, its expression does not

seem to correlate with HNF1B expression in the serous tumors.

13

Supplementary Table S1. Distribution of cases and controls by study site.

Geographic Region Study Design All Serous Mucinous Endometrioid Clear Cell Brenner Other

Australia Ovarian Cancer Study & Australia Cancer Study (Ovarian

Cancer) (AUS)Australia Population-based/case-control 1011 949 592 40 123 64 40 90

Bavarian Ovarian Cancer Cases and Controls (BAV) Southeast Germany Population-based/case-control 143 93 56 8 13 6 1 9

Belgium Ovarian Cancer Study (BEL)Belgium, University Hospital

LeuvenHospital-based/case-control 1352 277 195 25 22 23 2 10

Diseases of the Ovary and their Evaluation (DOV)USA: 13 counties in western

Wasthington statePopulation-based/case-control 1606 990 576 27 161 75 151 0

Germany Ovarian Cancer Study (GER)

Germany: two geographical

regions in the states of

BadenWürttemberg and

Rhineland-Palatinate in

southern Germany

Population-based/case-control 413 192 96 22 21 6 1 46

Gilda Radner Familial Ovarian Cancer Registry (GRR)* USA Familial cancer/case only 0 115 76 5 19 11 3 1

Hawaii Ovarian Cancer Study (HAW) USA: Hawaii Population-based/case-control 601 266 130 27 60 35 6 8

Hannover-Jena Ovarian Cancer Study (HJO) Germany Hospital-based/case-control 274 273 142 9 26 4 38 54

Hannover-Minsk Ovarian Cancer Study (HMO) Belarus Case-control 140 144 50 7 12 1 0 74

Helsinki Ovarian Cancer Study (HOC) Helsinki, Finland Case-control 447 218 113 45 28 14 0 18

Hormones and Ovarian Cancer Prediction (HOP)Western Pennsy, Northeastern

Ohio, Western New YorkPopulation-based/case-control 1501 682 388 32 90 43 50 79

DNA-Specimen in Gynecologic Oncologic Malignancies (HSK)* Germany Case only 0 146 109 1 16 0 3 17

Hospital-based Epidemiologic Research Program at Aichi Cancer

Center (JPN)Japan: Nagoya City Case-control 81 66 32 3 7 17 4 3

Women's Cancer Research Institute - Cedars-Sinai Medical Center

(LAX)*USA: Southern California Case only 0 330 248 15 26 13 27 1

Danish Malignant Ovarian Tumor Study (MAL) Denmark Population-based/case-control 829 440 272 42 54 33 0 39

Malaysia Ovarian Cancer Study (MAS) Malaysia Hospital-based/case-control 106 106 44 17 25 12 1 7

Mayo Clinic Ovarian Cancer Case Control Study (MAY)USA: North Central

(MN, SD, ND, IL, IA, WI)Clinic-based/ case-control 753 708 515 18 97 34 0 44

Melbourne Collaborative Cohort Study (MCC) Melbourne, Australia Cohort/Nested case-control 68 64 34 7 7 6 6 4

MD Anderson Ovarian Cancer Study (MDA) USA: Texas Hospital-based/case-control 385 323 194 29 29 4 1 66

Memorial Sloan Kettering Cancer Center Gynecology Tissue Bank

(MSK)USA: New York City Case-control 697 556 450 0 25 22 0 59

North Carolina Ovarian Cancer Study (NCO)USA: Central and eastern

North Carolina (48 counties)Population-based/case-control 984 850 480 43 130 85 112 0

New England-based Case-Control Study of Ovarian Cancer (NEC)USA: New Hampshire and

Eastern MassachusettsPopulation-based/case-control 1049 697 397 44 131 97 0 28

Nurses' Health Study (NHS) USAPopluation-based/nested case-

control429 127 68 7 14 6 13 19

New Jersey Ovarian Cancer Study (NJO) USA: New Jersey (six counties) Case-control 194 190 110 7 30 23 0 19

University of Bergen Norway Study (NOR) Norway Case-control 371 237 136 15 27 13 0 46

Nijmegen Polygene Study & Nijmegen Biomedical Study (NTH) Eastern part of the Netherlands Case-control 323 263 119 34 67 21 9 13

Oregon Ovarian Cancer Registry (ORE)* Portland, Oregon Case only 0 59 41 4 4 4 0 6

Ovarian Cancer in Alberta and British Columbia Study (OVA)Alberta and British Columbia,

CanadaCase-control 810 688 370 29 114 73 12 90

Poland Ovarian Cancer Study (POC)Poland: Szczecin, Poznan,

Opole, RzeszówCase-control 417 423 200 33 39 9 61 81

NCI Ovarian Case-Control Study in Poland (POL) Poland, Warszaw and Lodz Population-based/case-control 223 236 106 17 37 10 25 41

Pelvic Mass Study (PVD)* Denmark Population-based/case-control 0 172 130 11 14 8 6 3

Royal Marsden Hospital Case Series (RMH)* UK: London Hospital based/case only 0 151 52 16 29 17 0 37

UK Studies of Epidemiology and Risk Factors in Cancer Heredity

Ovarian Cancer Study (SEA)

UK: East Anglia and West

MidlandsPopulation-based/case-control 6067 1395 581 145 231 147 9 282

Southampton Ovarian Cancer Study (SOC)* United Kingdom, Wessex regionCase only/ hospital-based

0 274 105 34 64 11 7 53

Scottish Randomised Trial in Ovarian Cancer (SRO)*

Coordinated through clinical

trials unit, Glasgow UK from

patients recruited worldwide

Case only from clinical trial 0 159 93 3 17 9 25 12

Genetic Epidemiology of Ovarian Cancer (STA)USA: Six counties in the San

Francisco Bay areaPopulation-based/case-control 404 282 174 19 38 22 1 28

Shanghai Women's Health Study (SWH) Shanghai, China Cohort/nested case-control 891 135 0 0 0 0 0 135

Familial Ovarian Tumor Study (TOR) Canada: Province of Ontatio Population-based 443 559 341 39 132 34 0 13

UC Irvine Ovarian Cancer Study (UCI)

USA: Southern California

(Orange and San-Diego, Imperial

Counties)

Population-based/case-control 425 331 198 24 58 29 2 20

UK Ovarian Cancer Population Study (UKO)United Kingdom (England,

Wales and Northern Ireland)Population-based/case-control 1123 718 357 76 116 68 55 46

UK Familial Ovarian Cancer Registry (UKR)* UK: National Case only/ Familial Register 0 48 23 3 6 2 0 14

Los Angeles County Case-Control Studies of Ovarian Cancer (USC) Los Angeles County Population-based/case-control 1370 978 614 63 124 58 26 93

Warsaw Ovarian Cancer Study (WOC)Poland: Warsaw and central

PolandCase-control 204 202 132 8 20 17 1 24

Total 26134 16111 9139 1053 2303 1186 698 1732

* Case only study. For our analyses, GRR was merged with HOP, HSK with GER, LAX with USC, ORE with DOV, PVD with MAL, and RMH, SOC, SRO, and UKR with UKO.

Site

No. of

controls

Invasive Cases

14

Supplementary Table S2. Association between the genome-wide significantly associated HNF1B and serous ovarian cancer risk in non-Whites.

AAF OR p-value AAF OR p-value AAF OR p-value

rs3744763 0·56 1·13 0·91 - 1·40 0·26 0·07 0·95 0·45 - 2·01 0·89 0·33 0·77 0·64 - 0·93 0·01

17-36092841 0·34 1·11 0·87 - 1·41 0·40 0·53 1·23 0·82 - 1·85 0·31 0·40 0·87 0·72 - 1·04 0·13

rs7405776 0·29 1·10 0·87 - 1·38 0·42 0·51 1·16 0·79 - 1·70 0·45 0·38 0·80 0·67 - 0·95 0·01

rs757210 0·29 1·07 0·85 - 1·34 0·58 0·53 1·05 0·72 - 1·54 0·79 0·37 0·85 0·71 - 1·01 0·06

rs4239217 0·30 1·10 0·88 - 1·38 0·41 0·27 1·10 0·74 - 1·62 0·64 0·33 0·80 0·66 - 0·96 0·02

rs11651755 0·28 0·98 0·78 - 1·24 0·87 0·66 0·89 0·61 - 1·29 0·53 0·46 0·89 0·75 - 1·06 0·19

rs61612821 0·08 1·17 0·76 - 1·78 0·47 0·02 1·28 0·26 - 6·32 0·76 0·08 0·73 0·49 - 1·08 0·11

rs11657964 0·27 1·03 0·81 - 1·30 0·83 0·52 0·93 0·65 - 1·32 0·68 0·37 0·83 0·69 - 0·99 0·04

rs7501939 0·27 1·02 0·81 - 1·29 0·87 0·50 0·93 0·65 - 1·33 0·69 0·37 0·85 0·71 - 1·02 0·07

rs11658063 0·28 1·01 0·80 - 1·28 0·94 0·39 0·84 0·56 - 1·28 0·42 0·35 0·79 0·65 - 0·95 0·01

AAF=Alternate Allele Frequency* cases / controls

Asians (n=249 / 1573*) Africans (n=89 / 200

*) Other (n=431 / 870

*)

95% CI 95% CI 95% CI

15

Supplementary Note 1

PRACTICAL Consortium

Access to genotype data for SNPs that were not nominated by OCAC was provided by the PRACTICAL

Consortium investigators including: Doug Easton, Centre for Cancer Genetic Epidemiology, Department of Public

Health and Primary Care, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK;

Rosalind Eeles, The Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK and Royal

Marsden NHS Foundation Trust, Fulham and Sutton, London and Surrey, UK; Kenneth Muir, University of

Warwick, Coventry, UK; Graham Giles, Cancer Epidemiology Centre, The Cancer Council Victoria, 1 Rathdowne

street, Carlton Victoria, Australia and Centre for Molecular, Environmental, Genetic and Analytic Epidemiology,

The University of Melbourne, 723 Swanston street, Carlton, Victoria, Australia; Fredrik Wiklund, Department of

Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Henrik Gronberg, Department of

Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Christopher Haiman,

Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris

Comprehensive Cancer Center, Los Angeles, California, USA.; Johanna Schleutker, Department of Medical

Biochemistry and Genetics, University of Turku, Turku, Finland and Institute of Biomedical

Technology/BioMediTech, University of Tampere and FimLab Laboratories, Tampere, Finland. ; Maren Weischer,

Department of Clinical Biochemistry, Herlev Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-

2730 Herlev, Denmark; Ruth Travis, Cancer Epidemiology Unit, Nuffield Department of Clinical Medicine,

University of Oxford, Oxford, UK; David Neal, Surgical Oncology (Uro-Oncology: S4), University of Cambridge,

Box 279, Addenbrooke’s Hospital, Hills Road, Cambridge, UK and Cancer Research UK Cambridge Research

Institute, Li Ka Shing Centre, Cambridge, UK; Paul Pharoah, Centre for Cancer Genetic Epidemiology, Department

of Oncology, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK; Kay-Tee

Khaw, Cambridge Institute of Public Health, University of Cambridge, Forvie Site, Robinson Way, Cambridge CB2

0SR; Janet L. Stanford, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle,

Washington, USA and Department of Epidemiology, School of Public Health, University of Washington, Seattle,

Washington, USA; William J. Blot, International Epidemiology Institute, 1455 Research Blvd., Suite 550,

Rockville, MD 20850; Stephen Thibodeau, Mayo Clinic, Rochester, Minnesota, USA; Christiane Maier, Department

of Urology, University Hospital Ulm, Germany and Institute of Human Genetics University Hospital Ulm,

Germany; Adam S. Kibel, Brigham and Women's Hospital/Dana-Farber Cancer Institute, 45 Francis Street- ASB II-

3, Boston, MA 02115 and Washington University, St Louis, Missouri; Cezary Cybulski, International Hereditary

Cancer Center, Department of Genetics and Pathology, Pomeranian Medical University, Szczecin, Poland; Lisa

Cannon-Albright, Division of Genetic Epidemiology, Department of Medicine, University of Utah School of

Medicine.; Hermann Brenner, Division of Clinical Epidemiology and Aging Research, German Cancer Research

Center, Heidelberg Germany ; Jong Park , Division of Cancer Prevention and Control, H. Lee Moffitt Cancer

Center, 12902 Magnolia Dr., Tampa, Florida, USA; Radka Kaneva, Molecular Medicine Center and Department of

Medical Chemistry and Biochemistry, Medical University - Sofia, 2 Zdrave St, 1431, Sofia, Bulgaria; Jyotnsa Batra,

Australian Prostate Cancer Research Centre-Qld, Institute of Health and Biomedical Innovation and Schools of Life

Science and Public Health, Queensland University of Technology, Brisbane, Australia; Manuel R. Teixeira,

Department of Genetics, Portuguese Oncology Institute, Porto, Portugal and Biomedical Sciences Institute (ICBAS),

Porto University, Porto, Portugal; Maya Ghoussaini, Centre for Cancer Genetic Epidemiology, Department of Public

Health and Primary Care, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK;

Zsofia Kote-Jarai, The Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK; Ali Amin

Al Olama, Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of

Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, UK; Sara Benlloch, Centre for Cancer Genetic

Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Strangeways Laboratory,

Worts Causeway, Cambridge, UK.

16


Australian Ovarian Cancer Study Group

Members of the Australian Ovarian Cancer Study (AOCS) are listed below:

D. Bowtell, D. Gertig, A. Green, A. DeFazio, P. Webb, R Stuart-Harris; NSW- F Kirsten, J Rutovitz, P Clingan, A

Glasgow, A Proietto, S Braye, G Otton, J Shannon, T Bonaventura, J Stewart, S Begbie, M Friedlander, D Bell, S

Baron-Hay, A Ferrier (dec.), G Gard, D Nevell, N Pavlakis, S Valmadre, B Young, C Camaris, R Crouch, L

Edwards, N Hacker, D Marsden, G Robertson, P Beale, J Beith, J Carter, C Dalrymple, R Houghton, P Russell, L

Anderson, M Links, J Grygiel, J Hill, A Brand, K Byth, R Jaworski, P Harnett, R Sharma, G Wain; QLD-D Purdie,

D Whiteman, B Ward, D Papadimos, A Crandon, M Cummings, K Horwood, A Obermair, L Perrin, D Wyld, J

Nicklin; SA- M Davy, MK Oehler, C Hall, T Dodd, T Healy, K Pittman, D Henderson, J Miller, J Pierdes, A Achan;

TAS-P Blomfield, D Challis, R McIntosh, A Parker; VIC- B Brown, R Rome, D Allen, P Grant, S Hyde, R Laurie,

M Robbie, D Healy, T Jobling, T Manolitsas, J McNealage, P Rogers, B Susil, E Sumithran, I Simpson, I Haviv, K

Phillips, D Rischin, S Fox, D Johnson, S Lade, P Waring, M Loughrey, N O’Callaghan, B Murray, L Mileshkin, P

Allan; V Billson, J Pyman, D Neesham, M Quinn, A Hamilton, C Underhill, R Bell, LF Ng, R Blum, V Ganju; WA-

I Hammond, A McCartney (dec.), C Stewart, Y Leung, M Buck, N Zeps (WARTN)


Australian Cancer Study

Investigators: David C. Whiteman MBBS, PhD, Penelope M. Webb MA, D Phil, Adele C. Green MBBS, PhD,

Nicholas K. Hayward PhD, Peter G. Parsons PhD, David M. Purdie PhD; Clinical collaborators: B. Mark Smithers

FRACS, David Gotley FRACS PhD, Andrew Clouston FRACP PhD, Ian Brown FRACP; Project Manager:

Suzanne Moore RN, MPH; Database: Karen Harrap BIT, Troy Sadkowski BIT; Research Nurses: Suzanne O’Brien

RN MPH, Ellen Minehan RN, Deborah Roffe RN, Sue O’Keefe RN, Suzanne Lipshut RN, Gabby Connor RN,

Hayley Berry RN, Frances Walker RN, Teresa Barnes RN, Janine Thomas RN, Linda Terry RN MPH, Michael

Connard B Sc, Leanne Bowes B Sc, MaryRose Malt RN, Jo White RN; Clinical Contributors: Australian Capital

Territory: Charles Mosse FRACS, Noel Tait FRACS; New South Wales: Chris Bambach FRACS, Andrew Biankan

FRACS, Roy Brancatisano FRACS, Max Coleman FRACS, Michael Cox FRACS, Stephen Deane FRACS, Gregory

L. Falk FRACS, James Gallagher FRACS, Mike Hollands FRACS, Tom Hugh FRACS, David Hunt FRACS, John

Jorgensen FRACS, Christopher Martin FRACS, Mark Richardson FRACS, Garrett Smith FRACS, Ross

Smith FRACS, David Storey FRACS; Queensland: John Avramovic FRACS, John Croese FRACP, Justin D'Arcy

FRACS, Stephen Fairley FRACP, John Hansen FRACS, John Masson FRACP, Les Nathanson FRACS, Barry

O'Loughlin FRACS, Leigh Rutherford FRACS, Richard Turner FRACS, Morgan Windsor FRACS; South Australia:

Justin Bessell FRACS, Peter Devitt FRACS, Glyn Jamieson FRACS, David Watson FRACS; Victoria: Stephen

Blamey FRACS, Alex Boussioutas FRACP, Richard Cade FRACS, Gary Crosthwaite FRACS, Ian Faragher

FRACS, John Gribbin FRACS, Geoff Hebbard FRACP, George Kiroff FRACS, Bruce Mann FRACS, Bob Millar

FRACS, Paul O'Brien FRACS, Robert Thomas FRACS, Simon Wood FRACS; Western Australia: Steve Archer

FRACS, Kingsley Faulkner FRACS, Jeff Hamdorf FRACS

17

Supplementary Methods

Selection of SNPs

Tagging SNPs (tSNPs) were selected in the HNF1B region using the program SNAGGER34

from the International

HapMap Project CEU population (White) in order to cover all SNPs in the region with a minor allele frequency of

0·05 with an r2 of 0·80. This resulted in the selection of 40 SNPs. In addition, because of the association between

prostate cancer and HNF1B, an additional 134 SNPs were selected by The Prostate Cancer Association Group to

Investigate Cancer Associated Alterations in the Genome (The PRACTICAL Consortium35-37

) to provide full fine-

mapping information based on 174 genotyped SNPs. A 150kb-region surrounding HNF1B was identified for fine-

mapping (hg18 coordinates 33,100,000-33,250,000). Fine-mapping SNPs were selected at this locus from the March

2010 (Build 36) release of the 1000 Genomes Project for all known SNPs with minor allele frequency >0·02 in

Europeans and r2>0·1 with the reported prostate cancer associated SNPs (s11649743 and rs4430796).

IMPUTE provides estimated allele dosage for SNPs that were not genotyped and for samples with missing genotype

data for genotyped SNPs.

SNP Genotyping

Each 96-well plate contained 250 ng genomic DNA (or 500 ng whole-genome amplified DNA). Raw intensity data

files for all consortia were sent to the COGS data co-ordination centre at the University of Cambridge for centralized

genotype calling and QC.

Initial calling used a cluster file generated using 270 samples from Hapmap2. These calls were used for ongoing

QC checks during the genotyping. To generate the final calls used for the data analysis, we first selected a subset of

3,018 individuals, including samples from each of the genotyping centers, each of the participating consortia, and

each major ethnicity. Only plates with a consistent high call rate in the initial calling were used. The HapMap

samples and ~160 samples that were known positive controls for rare variants on the array were used to generate a

cluster file that was then applied to call the genotypes for the remaining samples. We also investigated two other

calling algorithms: Illumnus38

and GenoSNP39

, but manual inspection of a sample of SNPs with discrepant calls

indicated that GenCall was invariably superior.

Sample QC for Genotyping

One thousand two hundred and seventy three OCAC samples were genotyped in duplicate. Genotypes were

discordant for greater than 40 percent of SNPs for 22 pairs. For the remaining 1,251 pairs, concordance was greater

than 99·6 percent. In addition we identified 245 pairs of samples that were unexpected genotypic duplicates. Of

these, 137 were phenotypic duplicates and judged to be from the same individual. We used identity-by-state to

identify 618 pairs of first-degree relatives. Samples were excluded according to the following criteria: 1) 1,133

samples with a conversion rate of less than 95 percent; 2) 169 samples with heterozygosity >5 standard deviations

from the intercontinental ancestry specific mean heterozygosity; 3) 65 samples with ambiguous sex; 4) 269 samples

with the lowest call rate from a first-degree relative pair 5) 1,686 samples that were either duplicate samples that

were non-concordant for genotype or genotypic duplicates that were not concordant for phenotype. Thus, a total of

44,308 subjects including 16,111 invasive cases, 2,063 borderline cases and 26,134 controls were available for

analysis.

SNP Quality Control

In total, 211,155 SNP assays, identified across a number of studies, were successfully designed and included on the

array. SNPs were excluded according to the following criteria: (1) 1,311 SNPs without a genotype call; (2) 2,857

monomorphic SNPs; (3) 5,201 SNPs with a call rate less than 95 percent and MAF > 0·05 or call rate less than 99

percent with MAF < 0·05; (4) 2,194 SNPs showing evidence of deviation of genotype frequencies from Hardy-

Weinberg equilibrium (P<10-7

); (5) 22 SNPS with greater than two percent discordance in duplicate pairs. Overall,

94·5 percent passed QC. Genotype clusters were visually inspected for the most strongly associated SNPs.

Statistical Analysis

18

Subjects with greater than 90 percent European ancestry were classified as European (n=39,944) and those with

greater than 80 percent Asian and African ancestry were classified as being Asian (n=2,388) and African

respectively (n=387). All other subjects were classified as mixed ancestry (n=1,770). We then used a set of 37,000

additional genotyped markers not suspected to be related to ovarian cancer risk to perform principal components

analysis within each major population subgroup40

. To enable this analysis on very large-scale samples we used an

in-house program written in C++ using the Intel MKL libraries for eigenvectors (available at

http://ccge.medschl.cam.ac.uk/software/).

For the non-European groups for all invasive cases and serous cases as well as for all groups for the other subtypes,

we were not able to carry out within study analyses due to the small sample sizes available. We thus conducted

unconditional logistic regression models adjusted for the first five principal components for the European ancestry

and the first two principal components for the other ancestry groups as well as study site.

To evaluate the independence of associations between the top serous and clear cell SNPs, we fit separate models by

histology that contained both SNPs. In addition, two correlated SNPs were found to be associated with both serous

and clear cell subtypes of ovarian cancer, with one SNP being more strongly associated with serous (rs7405776) and

the other more strongly associated with clear cell (rs11651755). It is conceivable that the associations for both sub-

types are being driven by the same SNP, but, by chance, the other correlated SNP is giving a stronger signal for one

of the sub-types. We therefore compared the log-likelihood statistics logistic regression models for each SNP with

each subtype. The odds in favor of one SNP being the driver of the signal is given as exp(log-likelihoodSNP1 - log-

likelihoodSNP2).

The region for haplotype analysis was defined as extending to the point around the top serous SNP, rs7405776,

where there were no SNPs with an r2>0·20 with a minor allele frequency of 5%.

TCGA Packages Used

Affymetrix HT Human Genome U133 Array Plate Set

broad.mit.edu_OV.HT_HG-U133A.Level_3.11.1007.0/














Agilent 244K Custom Gene Expression G4502A-07-3

unc.edu_OV.AgilentG4502A_07_3.Level_3.1.5.0/

unc.edu_OV.AgilentG4502A_07_3.Level_3.2.0.0/

Affymetrix Human Exon 1.0 ST Array

http://ccge.medschl.cam.ac.uk/software/

19

lbl.gov_OV.HuEx-1_0-st-v2.Level_3.11.2.0/













Illumina Infinium HumanMethylation27 Beadchip

jhu-usc.edu_OV.HumanMethylation27.Level_3.1.4.0/













DNA methylation / mRNA Expression Correlation

The TCGA DNA methylation data were generated on the Illumina Infinium HumanMethylation27 Beatchip. A total

of 576 tumors and 14 fallopian tube samples were assayed. The TCGA mRNA data were generated on three

platforms. A total of 592 unique tumor samples were assayed, with 512 assayed on all three platforms, and 80 on

two of the three platforms. Ten normal fallopian tube samples were assayed as well, four on all three platforms and

six on two of the three. 574 of the tumor samples and all ten fallopian tube samples had matching DNA methylation

data. Scatterplots were used to examine the association between the mRNA expression data and DNA methylation

data matched with a 16 digit TCGA ID. The DNA methylation probe cg02335804 was used for HNF1B promoter

DNA methylation level, as was also used throughout the paper for samples assayed on the HumanMethylation27

platform. The evaluation was done for both integrated mRNA expression data and for each expression platform

separately.

To integrate data from the three platforms, we median-centered41

the Level 3 data (log intensity for the one color

channel platforms and log ratio for the two color channel platform) for HNF1B expression for each platform. Then

we took the median of the log ratio estimates from the three platforms as the relative HNF1B expression level for

20

each sample. Spearman correlation was used to assess the correlation between gene expression and DNA

methylation.

Quality Control of The Infinium HumanMethylation450 BeadChip Assay

The quality of the bisulfite converted DNA and the performance of the CpG probes were assessed using a CEPH

control, a whole genome amplified (WGA) negative control and placental positive control samples. Internal

placental positive control, WGA negative control, and a CEPH control were used for quality control and the mean

intra-class correlation across the two batches of samples was 0·90, 0·96 and 0·99, respectively. Intra-class

correlation for ovarian duplicate samples was > 0·99.

SNP/DNA Methylation Association Validation With TCGA Data

Validation was done with the TCGA data with 519 tumors. Four out of nine SNPs are available on the TCGA

platform. For promoter DNA methylation, cg02335804 was used as a surrogate since cg14487292 (the two CpGs

are 278bp away) was not present on the HumanMethylation27 platform. The p-values are from one-sided tests for

linear trend in the DNA methylation beta value across the three genotypes for each locus. The nominal Bonferroni

adjusted p-value cutoff would be 0·013 (0·05/4).

Genomic/Epigenomic Data Analysis and Visualization

The statistical analyses were done in R (version 2.15.0). Mapping and characterization of the HumanMethylation450

probes were done with the R package IlluminaHumanMethylation450k.db. The UCSC tracks were downloaded with

rtracklayer41

. The PRC1 (Ring1b) and PRC2 mark (H3K27me3) ChIP-seq32

and the chromatin state data

(ChromHMM)33

were from previous work in embryonic stem cells. The genomic and epigenomic data were mapped

to the genome with Build37(hg19) coordinates, and visualized using the R packages with GenomicRanges42

and

Gviz43

.

In vitro model of HNF1B overexpression

Lentivial plasmids encoding TERT (Addgene plasmid 12245), HNF1B-GFP or GFP (Genecopeia) were co-

transfected using Lipofectamine™ (Invitrogen) with pMD2.G and p8.91 plasmids into HEK293T virus producer

cells. Cells were refed the following day and virus harvested, filtered though a 0·45μm filter and snap-frozen 48

hours later. Viral titres were analyzed and target cells tranduced overnight in the presence of 8 μg/ml polybrene

(Sigma).

An immortalized endometrosis epithelial cell (EEC) line was generated by lentiviral transduction of hTERT into

primary EECs. Extended in vitro lifespan was confirmed by growth curve analysis (Figure S10). TERT immortalized

EECs were transduced with lentiviral HNF1B-GFP or GFP supernatants and positive cells selected with 400ng/ml

puromycin (Sigma). GFP expression was confirmed by fluorescent microscopy; HNF1B was confirmed by real-time

PCR (Supplementary Figure S10).

Gene expression analysis

RNA was harvested from cells using the QIAgen RNeasy kit with on-column DNase I digestion. 1μg RNA was

reverse transcribed using an MMLV reverse transcriptase enzyme (Promega). Gene expression analyses were

performed using TaqMan PCR probes (HNF1B, Hs01001602_m1; DPP4, Hs00175210; ACE 2, Hs01085333_m1,

SPP1, Hs00959010_m1; β-actin, Hs00357333_g1; GAPDH, Hs02758991_g1; Applied Biosystems) and analyzed

using the ABI 7900HT FAST Real-Time PCR system. Relative expression of each gene of interest was calculated

using the delta-delta Ct method; Ct values for each gene were normalized to mean Ct values for β-actin and

GAPDH. Statistical analyses were performed using Prism software. Two-tailed paired t-tests with significance

cutoffs of 0·05 were used.

21

References

34. Edlund CK, Lee WH, Li D, Van Den Berg DJ, Conti DV. Snagger: a user-friendly program for incorporating

additional information for tagSNP selection. BMC Bioinformatics 2008;9:174.

35. Kote-Jarai Z, Easton DF, Stanford JL, Ostrander EA, Schleutker J, Ingles SA, et al. Multiple novel prostate

cancer predisposition loci confirmed by an international study: the PRACTICAL Consortium. Cancer

Epidemiol Biomarkers Prev 2008;17:2052-61.

36. Kote-Jarai Z, Olama AA, Giles GG, Severi G, Schleutker J, Weischer M, et al. Seven prostate cancer

susceptibility loci identified by a multi-stage genome-wide association study. Nat Genet 2011;43:785-91.

37. Eeles RA, Kote-Jarai Z, Al Olama AA, Giles GG, Guy M, Severi G, et al. Identification of seven new prostate

cancer susceptibility loci through a genome-wide association study. Nat Genet 2009;41:1116-21.

38. Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, et al. A genotype calling algorithm

for the Illumina BeadArray platform. Bioinformatics 2007;23:2741-6.

39. Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC. GenoSNP: a variational Bayes within-sample SNP

genotyping algorithm that does not require a reference population. Bioinformatics 2008;24:2209-14.

40. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis

corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904-9.

41. Lawrence M, Gentleman R, Carey, V. rtracklayer: an R package for interfacing with genome browsers.

Bioinformatics 2009;25:1841-2.

42. Aboyoun P, Pages H, Lawrence M. GenomicRanges: Representation and manipulation of genomic intervals. R

package version 1.8.3 edn.

43. Hahne F, Durinck S, Ivanek R, Mueller A. Gviz: Plotting data and annotation information along genomic

coordinates. R package version 1.0.0 edn.

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Inouye%20M%5BAuthor%5D&cauthor=true&cauthor_uid=17846035

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Small%20KS%5BAuthor%5D&cauthor=true&cauthor_uid=17846035

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Gwilliam%20R%5BAuthor%5D&cauthor=true&cauthor_uid=17846035

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Deloukas%20P%5BAuthor%5D&cauthor=true&cauthor_uid=17846035

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Kwiatkowski%20DP%5BAuthor%5D&cauthor=true&cauthor_uid=17846035

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Patterson%20NJ%5BAuthor%5D&cauthor=true&cauthor_uid=16862161

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Plenge%20RM%5BAuthor%5D&cauthor=true&cauthor_uid=16862161

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Weinblatt%20ME%5BAuthor%5D&cauthor=true&cauthor_uid=16862161

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Shadick%20NA%5BAuthor%5D&cauthor=true&cauthor_uid=16862161

http://www.ncbi.nlm.nih.gov.libproxy.usc.edu/pubmed?term=Reich%20D%5BAuthor%5D&cauthor=true&cauthor_uid=16862161

supplementary information epigenetic analysis leads to ... · 1 supplementary information...

Documents