supporting information - pnas...2012/01/19  · supporting information volkmer et al....

15
Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data points from 25,237 microarrays in human Affymetrix U133 Plus 2.0; 16,357 microarrays in human Affymetrix U133A; and 3,969 microarrays in human Affymetrix U133A 2.0 platforms from the National Center for Biotechnology Information (NCBI)s Gene Expression Omnibus (GEO) database (1) and normalized these datasets using the robust multichip average (RMA) algorithm (2). We also downloaded and normalized 10,843 Mouse_430_2.0 microarrays and 3,079 Rat_430_2.0 microarrays from GEO. We computed the thresholds for each gene using the StepMiner al- gorithm (3) and built a complete database of Boolean relation- ships between pairs of genes using BooleanNet (4). We identied and manually veried a set of 138 bladder transitional cell carcinoma arrays by automatically searching through the GEO description pages of all downloaded microarrays. These 138 bladder cancer (BC) arrays were distributed in the human Affymetrix platforms described above. We normalized all of the microarrays in all three human Affymetrix platforms (n = 45,563) using a modied chip description le (CDF) with RMA. From this normalized gene expression dataset, we selected our 138 identied BC arrays for further analysis (AffyBC dataset). Additionally, we downloaded three independent BC datasets with survival data from NCBI GEO public database: 403 samples from Dyrskjøt et al. (European dataset, GSE5479) (5), 256 sam- ples including 165 primary BC samples from Kim et al. (Chung- buk dataset, GSE13507) (6), and 89 samples from Lindgren et al. (Lindgren dataset, GSE19915) (7). We also downloaded another dataset from the Journal of Clinical Oncology (JCO) website by Sanchez-Carbaryo et al. (SanchezC dataset, survival follow-up times were not available) to analyze gene expression from normal bladder tissue (8). These datasets were already normalized and log2 transformed. We renormalized the Lindgren dataset because the KRT14 gene was missing in the original le from GEO. This renormalization was performed using background correction, print tip loess within each microarray, and quantile normalization between microarrays using limma package in R (Bioconductor) (9, 10). The clinical information for the European dataset was downloaded from Oncomine, the Chungbuk dataset was down- loaded from GEO, and the Lindgren dataset was downloaded from the Cancer Research website. For the European dataset, overall survival status, progression status, relapse status, age, sex, stage, grade, and treatments were available. For the Chungbuk dataset, overall survival status, cancer specic survival status, age, sex, stage, and grade were available. Similarly, for the Lindgren dataset, cancer-specic survival status, stage, and grade were available. In all datasets, largest follow-up times were at least 5 y. Statistical Analysis and Software Used. StepMiner software is used to compute thresholds for high and low expression levels (3, 4). For this application, the expression values for each gene were ordered from low to high, and StepMiner was used to t a rising step function to the data that minimizes the differences (mean squared errors) between the tted and measured values. This approach places the step at the largest jump from low values to high values (but only if there are sufciently many expression values on each side of the jump to provide evidence that the jump is not due to noise) and sets the threshold at the point where the step crosses that original data. A noise margin of twofold change (±0.5) was considered around the StepMiner threshold and used as alternative thresholds when a stringent high or low condition is required. BooleanNet statistics was used to infer Boolean relationships between two genes (4). The mining developmentally regulated genes (MiDReG) approach is used to predict developmentally regulated genes in BC (11). We also developed a new web-based software called hierarchical exploration of gene expression microarrays online (Hegemon) that explores gene expression data and their clinical information using a scatterplot of log2 reduced gene expression values of two genes. This software provides a simple framework for automatic selection of patient groups on the basis of gene expression val- ues, as well as other clinical parameters, and performs auto- mated survival analysis. We used this software to test our hypothesis that patients with a basal tumor phenotype have worse survival outcomes compared with a mature tumor phe- notype. This algorithm was also used to identify cell-surface markers corresponding to this differentiation state. KaplanMeier estimates, univariate and multivariate (using discrete and continuous KRT14) Cox regression analyses, were performed using the R software (R version 2.12.1 201012-16 available at http://www.r-project.org). The P values were calculated with the use of the log-rank test (indicated in all Kaplan-Meier plots). The ratio on the right of KaplanMeier plots in the main and Supporting Information gures shows the incidence proportions (1 survival proportions). Boxplot with mean and condence intervals are prepared with plotCI function of gplots package in the R software. Identication of Differentiation States in Bladder Cancer Using MiDReG. We used MiDReG to predict genes developmentally regulated in BC as shown in Fig. 2 AD and Fig. S1 BE. We focused specically on keratins for possible candidates because it is well known that gene expression of keratins changes system- atically during epithelial tissue development. The seed genes used for this analysis were KRT5 and KRT20. On the basis of our previous results (12), CD44 + tumor cells are upstream of CD44 during development and CD44 + cells are mostly KRT5 + , and CD44 cells are mostly KRT20 + (Fig. S1A). Therefore, for our analysis we assume that KRT5 + tumor cells are upstream of KRT20 + tumor cells. In this case, we set off to predict genes upstream of KRT5 + tumor cells to further subdivide the original tumor-initiating populations. We formulated this problem in the context of MiDReG as follows. On the basis of previously known biology, KRT5 turns off during development, KRT20 turns on during development, and KRT5 is mutually exclusive with KRT20 (KRT5 high KRT20 low ) (Fig. 2A and Fig. S1B). Our goal is to predict gene X that turns off during development but earlier than KRT5. Therefore, we searched for Boolean rela- tionships X high KRT5 high and X high KRT20 low in our BooleanNet database (on 25,237 Affymetrix U133 Plus 2.0 mi- croarrays) (Fig. 2 BD and Fig. S1 BE). This analysis identied 137 genes, which were ltered using the AffyBC dataset (n = 138, BC dataset) by good dynamic range of gene expression values (>5), diversity (stddev > 1.5), and keratins. The ltered list of genes contained seven keratins (KRT4, KRT13, KRT14, KRT16, KRT6B, KRT6A, and KRT6C). We examined the heatmaps of these keratinsgene expression in both the AffyBC dataset and the Chungbuk dataset (Fig. S1F) and found four good candidate keratins (KRT14, KRT16, KRT6B, and KRT6A) (Fig. S1G). Additionally, we used the Chungbuk da- taset to check whether the predicted markers are associated with patient survival (by computing hazard ratio (HR) using Cox re- gression analysis of the high and low expression values as iden- Volkmer et al. www.pnas.org/cgi/content/short/1120605109 1 of 15

Upload: others

Post on 05-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

Supporting InformationVolkmer et al. 10.1073/pnas.1120605109SI MethodsData Collection and Processing. We downloaded 1.5 billion datapoints from 25,237 microarrays in human Affymetrix U133 Plus2.0; 16,357 microarrays in human Affymetrix U133A; and 3,969microarrays in human Affymetrix U133A 2.0 platforms from theNational Center for Biotechnology Information (NCBI)’s GeneExpression Omnibus (GEO) database (1) and normalized thesedatasets using the robust multichip average (RMA) algorithm(2). We also downloaded and normalized 10,843 Mouse_430_2.0microarrays and 3,079 Rat_430_2.0 microarrays from GEO. Wecomputed the thresholds for each gene using the StepMiner al-gorithm (3) and built a complete database of Boolean relation-ships between pairs of genes using BooleanNet (4). Weidentified and manually verified a set of 138 bladder transitionalcell carcinoma arrays by automatically searching through theGEO description pages of all downloaded microarrays. These138 bladder cancer (BC) arrays were distributed in the humanAffymetrix platforms described above. We normalized all of themicroarrays in all three human Affymetrix platforms (n =45,563) using a modified chip description file (CDF) with RMA.From this normalized gene expression dataset, we selected our138 identified BC arrays for further analysis (AffyBC dataset).Additionally, we downloaded three independent BC datasets

with survival data from NCBI GEO public database: 403 samplesfrom Dyrskjøt et al. (European dataset, GSE5479) (5), 256 sam-ples including 165 primary BC samples from Kim et al. (Chung-buk dataset, GSE13507) (6), and 89 samples from Lindgren et al.(Lindgren dataset, GSE19915) (7). We also downloaded anotherdataset from the Journal of Clinical Oncology (JCO) website bySanchez-Carbaryo et al. (SanchezC dataset, survival follow-uptimes were not available) to analyze gene expression from normalbladder tissue (8). These datasets were already normalized andlog2 transformed. We renormalized the Lindgren dataset becausethe KRT14 gene was missing in the original file from GEO. Thisrenormalization was performed using background correction,print tip loess within each microarray, and quantile normalizationbetween microarrays using limma package in R (Bioconductor)(9, 10). The clinical information for the European dataset wasdownloaded from Oncomine, the Chungbuk dataset was down-loaded from GEO, and the Lindgren dataset was downloadedfrom the Cancer Research website. For the European dataset,overall survival status, progression status, relapse status, age, sex,stage, grade, and treatments were available. For the Chungbukdataset, overall survival status, cancer specific survival status,age, sex, stage, and grade were available. Similarly, for theLindgren dataset, cancer-specific survival status, stage, and gradewere available. In all datasets, largest follow-up times were atleast 5 y.

Statistical Analysis and Software Used. StepMiner software is usedto compute thresholds for high and low expression levels (3, 4).For this application, the expression values for each gene wereordered from low to high, and StepMiner was used to fit a risingstep function to the data that minimizes the differences (meansquared errors) between the fitted and measured values. Thisapproach places the step at the largest jump from low values tohigh values (but only if there are sufficiently many expressionvalues on each side of the jump to provide evidence that thejump is not due to noise) and sets the threshold at the pointwhere the step crosses that original data. A noise margin oftwofold change (±0.5) was considered around the StepMinerthreshold and used as alternative thresholds when a stringent

high or low condition is required. BooleanNet statistics was usedto infer Boolean relationships between two genes (4). Themining developmentally regulated genes (MiDReG) approach isused to predict developmentally regulated genes in BC (11). Wealso developed a new web-based software called hierarchicalexploration of gene expression microarrays online (Hegemon)that explores gene expression data and their clinical informationusing a scatterplot of log2 reduced gene expression values of twogenes. This software provides a simple framework for automaticselection of patient groups on the basis of gene expression val-ues, as well as other clinical parameters, and performs auto-mated survival analysis. We used this software to test ourhypothesis that patients with a basal tumor phenotype haveworse survival outcomes compared with a mature tumor phe-notype. This algorithm was also used to identify cell-surfacemarkers corresponding to this differentiation state. Kaplan–Meier estimates, univariate and multivariate (using discrete andcontinuous KRT14) Cox regression analyses, were performedusing the R software (R version 2.12.1 2010–12-16 available athttp://www.r-project.org). The P values were calculated with theuse of the log-rank test (indicated in all Kaplan-Meier plots).The ratio on the right of Kaplan–Meier plots in the main andSupporting Information figures shows the incidence proportions(1 – survival proportions). Boxplot with mean and confidenceintervals are prepared with plotCI function of gplots package inthe R software.

Identification of Differentiation States in Bladder Cancer UsingMiDReG. We used MiDReG to predict genes developmentallyregulated in BC as shown in Fig. 2 A–D and Fig. S1 B–E. Wefocused specifically on keratins for possible candidates because itis well known that gene expression of keratins changes system-atically during epithelial tissue development. The seed genesused for this analysis were KRT5 and KRT20. On the basis ofour previous results (12), CD44+ tumor cells are upstream ofCD44− during development and CD44+ cells are mostly KRT5+,and CD44− cells are mostly KRT20+ (Fig. S1A). Therefore, forour analysis we assume that KRT5+ tumor cells are upstream ofKRT20+ tumor cells. In this case, we set off to predict genesupstream of KRT5+ tumor cells to further subdivide the originaltumor-initiating populations. We formulated this problem in thecontext of MiDReG as follows. On the basis of previously knownbiology, KRT5 turns off during development, KRT20 turns onduring development, and KRT5 is mutually exclusive withKRT20 (KRT5high → KRT20low) (Fig. 2A and Fig. S1B). Ourgoal is to predict gene X that turns off during development butearlier than KRT5. Therefore, we searched for Boolean rela-tionships Xhigh → KRT5high and Xhigh → KRT20low in ourBooleanNet database (on 25,237 Affymetrix U133 Plus 2.0 mi-croarrays) (Fig. 2 B–D and Fig. S1 B–E). This analysis identified137 genes, which were filtered using the AffyBC dataset (n =138, BC dataset) by good dynamic range of gene expressionvalues (>5), diversity (stddev > 1.5), and keratins. The filteredlist of genes contained seven keratins (KRT4, KRT13, KRT14,KRT16, KRT6B, KRT6A, and KRT6C). We examined theheatmaps of these keratins’ gene expression in both the AffyBCdataset and the Chungbuk dataset (Fig. S1F) and found fourgood candidate keratins (KRT14, KRT16, KRT6B, andKRT6A) (Fig. S1G). Additionally, we used the Chungbuk da-taset to check whether the predicted markers are associated withpatient survival (by computing hazard ratio (HR) using Cox re-gression analysis of the high and low expression values as iden-

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 1 of 15

Page 2: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

tified by StepMiner) (Fig. S1F). We used the Chungbuk datasetbecause of the high quality of gene expression values from Illu-mina platform for many known genes. Two of the identified ker-atins have strong associations with patient survival (KRT14, HR=2.75, P< 0.05; KRT6B, HR= 3.48, P< 0.05) (Fig. S1F).We choseKRT14 as a potential candidate for experimental validation be-cause large cohort of patients had high KRT14 gene-expressionlevels as opposed to other predicted keratins in both the AffyBCand the Chungbuk dataset. KRT14 has strong Boolean relation-ships: KRT14high → KRT5high” and “KRT14high → KRT20low.KRT14 is strongly associated with patient survival in the Chung-buk dataset (HR= 2.75, P < 0.05). Therefore, we hypothesize thatKRT14 turns off during bladder cancer development earlier thanKRT5. On the basis of this hypothesis, BC development is dividedinto three different states: KRT14+KRT5+KRT20− (basal),KRT14−KRT5+KRT20− (intermediate), and KRT14−KRT5−

KRT20+ (differentiated) (Fig. 2F).

Identification of Corresponding Surface Markers to Keratins UsingHegemon. To experimentally validate the differentiation statesin BC defined by keratins, we predicted the corresponding surfacemarkers using two large BC datasets with high quality gene ex-pression values (Fig. S3): the AffyBC dataset, 138 samples inhuman Affymetrix platforms; and the Chungbuk dataset, with all256 samples in Illumina Human-6 BeadChip (48K) platform. Forthe AffyBC dataset, StepMiner threshold for each probeset wascomputed using all 45,563 publicly available microarray datasetincluding 25,237 microarrays in human Affymetrix U133 Plus 2.0;16,357 microarrays in human Affymetrix U133A; and 3,969microarrays in human Affymetrix U133A 2.0 platforms that werenormalized together (Fig. S3C). For the Chungbuk dataset,StepMiner threshold was computed using 256 samples (Fig.S3C). Within these datasets, we identified BC samples with allthree phenotypes as described above: basal (KRT14+KRT5+

KRT20−), intermediate (KRT14−KRT5+KRT20−), and differ-entiated (KRT14−KRT5−KRT20+) as shown in Fig. S3E. Wesearched for two different groups of surface genes. First, wesearched for those whose expression values in the basal pheno-types are high in average, but low in differentiated phenotypes,and down-regulated in intermediate phenotypes (Fig. S3F andDataset S1); these genes are strongly down-regulated with dif-ferentiation. The high and low gene expression values are com-puted using StepMiner as described above (Fig. S3C). Thesecond group of surface genes is similar to the first except that indifferentiated phenotypes the average gene expression values arelower than in the basal phenotypes, but still high (Fig. S3G andDataset S1); these genes are slightly down-regulated with dif-ferentiation. Subsequently, we ranked both groups of surfacegenes on the basis of their association with patient survival in theChungbuk dataset (by computing hazard ratio using Cox re-gression analysis of the high and low expression values as iden-tified by StepMiner). Genes from the first group, strongly down-regulated from basal to differentiated tumors contain CD44, thepreviously identified marker for tumor-initiating cells in BC.However, CD44 was ranked low because of lower hazard ratioand surprisingly it was less than 1. To identify an upstreammarker, four top genes based on expression levels and hazardratios are considered from the first group including CD248,S100A8, COL1A1, and CD90 (THY1). As FACS-compatibleantibodies to CD248, S100A8, and COL1A1 are not commer-cially available, we focused on CD90 first. CD90 and CD248were highly correlated (Chungbuk dataset, coefficient = 0.67;Affy dataset, coefficient = 0.63) with each other. To identifya downstream marker, we focused on CD49f (ITGA6) from thesecond group as this marker enriches basal cell populations, iscoexpressed with KRT14 in normal and cancer epithelial tissue,and is functionally associated with stem and tumor initiating cellsin other epithelial tissues.

Patient Classification for Outcome Analysis According to BladderCancer Differentiation Status. For survival analysis, we dividedthe BC patients using gene expression values of keratins andsurface markers separately. We performed all of the followingsteps using our newly developed software Hegemon.To determine the clinical relevance of our predicted keratin

differentiation states first, we divided the patients into basal,intermediate, and differentiated phenotype whenever possible(all genes present in the datasets). Thus, the European datasetwas not used for this analysis because gene expression for KRT5,KRT20 and CD90, CD49f was not measured in that microarrayplatform. For the Lindgren dataset, we used a stringent low (t −0.5) for KRT14 and KRT5 and a stringent high (t + 0.5) forKRT20 to collect more patients with basal and intermediatephenotypes (Fig. S4A). The basal phenotype in the Lindgrendataset was defined as KRT14+KRT5+KRT20− (KRT14+,ID:21409 ≥ 1.2–0.5; KRT5+, ID:4006 ≥ 1.92–0.5; KRT20−,ID:16873 < -0.05+0.5), the intermediate phenotype was definedas KRT14−KRT5+KRT20− (KRT14−, ID:21409 < 1.2–0.5;KRT5+, ID:4006 ≥ 1.92–0.5; KRT20−, ID:16873 < -0.05+0.5),and the differentiated phenotype was defined as KRT14−

KRT5−KRT20+ (KRT14−, ID:21409 < 1.2–0.5; KRT5−, ID:4006 < 1.92–0.5; KRT20+, ID:16873 ≥ -0.05+0.5). To evaluatethe clinical relevance of our predicted corresponding surfacemarker differentiation states, we used gene expression values ofthese surface markers (CD90, CD44, and CD49f) in the Lindg-ren datasets (Fig. S4B). Due to the lack of gene expressionvalues of these surface markers, the European dataset could notbe used for analysis. StepMiner threshold (t) was used to de-termine the high and low gene expression levels of each surfacemarker. For the Lindgren dataset, the basal phenotype was de-fined as CD90+CD44+CD49f+ (CD90+, ID:2627 ≥ −0.59;CD44+, ID:6797 ≥ −1.38; CD49f+, ID:21822 ≥ 0.47), the in-termediate phenotype was as CD90−CD44+CD49f+ (CD90−,ID:2627 < −0.59; CD44+, ID:6797 ≥ −1.38; CD49f+, ID:21822 ≥0.47), and the differentiated phenotype was defined as CD90−

CD44−CD49f+ (CD90−, ID:2627 < −0.59; CD44−, ID:6797 <−1.38; CD49f+, ID:21822 ≥ 0.47).Because a subset of patient samples in our analysis does not fit

easily into the three BC subtypes described above (others, gray;Fig. S4 A and B), we analyzed additionally all possible combi-nations of keratins and cell surface markers (±) (Fig. S4 C–E),which reveals additionally heterogeneity and might be correlatedwith BC dedifferentiation.

Immunofluorescence Staining. Optimal cutting temperature (OCT)compound-embedded frozen tissue was sectioned into 5-μmthick sections and fixed with ethanol. Slides were then blockedwith 10% goat serum and probed with anti-CD44 (Calbiochem;217594), anti-KRT5 (Abcam; ab53121), anti-KRT20 (Abcam;ab962), and anti-KRT14 (Abcam; ab53115) antibodies for 2 h.Samples were stained with goat antimouse, -rabbit, and -ratsecondary antibodies conjugated with Alexa 488/594 (Invitrogen)and nuclear counterstained with Hoechst (Invitrogen). Slideswere imaged on a Leica fluorescent microscope.

Bladder Tumor Tissue Dissociation. The Stanford University and theBaylor College of Medicine institutional review boards (IRBs)approved the enrollment of human subjects under protocols 1512and H-26809, respectively. Tumor tissues were mechanicallydissociated in Medium 199 containing Liberase TM and THenzymes (Roche), DNase (Worthington) and Pluronic-F68(Sigma) at 37 °C until single-cell suspension was achieved (3–6 h).Cells were then washed twice with PBS and filtered through a70-μm filter.

Flow Cytometry Analysis and Cell Sorting. Tumor cell suspensionswere stained with phycoerythrin (PE)-conjugated anti-CD44 (BD

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 2 of 15

Page 3: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

PharMingen; 550989), Alexa 700-conjugated anti-CD90 (Biol-egend; 328120), APC-conjugated anti-CD49f (Biolegend; 17–0495), PerCP/Cy5.5-conjugated anti-ESA (Biolegend; 324214),and lineage mixture containing Pacific-blue–conjugated anti-CD45 (Biolegend; human 304022 and mouse 103125), anti-CD31 (Biolegend; human 303114 and mouse 102422), and H-2Kd (Biolegend; 116616) antibodies. Flow cytometry analysis andcell sorting was performed on a BD FACSAria (Becton Dick-inson) cell sorting system under 20 psi with a 100-μm nozzle.

Real-Time PCR. Total RNA was isolated from sorted cell pop-ulations using the mirVana RNA isolation kit (Ambion)according to the manufacturer’s protocol. RNA was then sub-jected to cDNA synthesis using superscript III (Invitrogen) ac-cording to the manufacturer’s protocol. The cDNA was thenprocessed through a preamplification step with final Taqmanprobe PCR reactions run on an ABI 7900 machine (AppliedBiosystems) according to the manufacturer’s protocol. TheqPCR results were normalized using β-actin as endogenouscontrol. (β-actin, Hs00357333_g1; KRT14, Hs00559328_m1;KRT5, Hs00361185_m1; KRT20, Hs00300643_m1; AppliedBiosystems).

Xenotransplantation of Patient Cancer Cells into ImmunocompromisedMice (Non-obese diabetic scid gamma). The Stanford AdministrativePanel on Laboratory Animal Care approved the mouse studiesunder protocol 10725:4. FACS-purified human patient cancer cellswere suspended and mixed with 25% Matrix-Matrigel (BectonDickinson; 354248) and injected (103–104 cells), as indicated,intradermal into the dorsal skin of 4- to 8-wk-old NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (NSG) mice, as described previously (12).

KRT14 Immunohistochemistry. A total of 158 (Stanford) and 117(Baylor) formalin-fixed paraffin-embedded (FFPE) BC tissues(from1994 to2006)withat least a5-y follow-upwere collectedusingHealth Insurance Portability and Accountability Act (HIPAA)compliant Stanford (IRB 1512) and Baylor (IRB H-26809) in-stitutional review board approval. FFPE sections (6 μm) weredeparaffinized and hydrated with graded ethanol. Antigen re-trieval was performed with 10 mM citrate buffer at pH 6.0, fol-lowed by primary antibody for KRT14 (Covance; PRB-155P,rabbit polyclonal 1:3,000) incubation. Antibody specificity wasdetermined byWestern blot analysis (Fig. S5C). Immunoreactivitywas visualized using Vector’s Vectastain Elite ABC kit (rabbit)following the manufacturer’s protocol. A trained pathologist an-alyzed the staining. Nonepithelial cells (stroma cells) used as in-ternal negative control showed no reactivity. BC specimens withsquamous differentiation were used as positive control. Stainingwas scored according to the extent of KRT14+ epithelial cancercells, with higher scores indicating a greater proportion of positivecells: 0 (no positive cells), 1 (<5%), 2 (5–50%), and 3 (>50%).Tissue without epithelial cells was excluded. Patients were strati-fied as follows: negative (score 0–1); positive (score 2–3). Analysiswas performed without knowledge of tumor stage, grade, or clin-ical follow-up. Images and scores are accessible online: http://genepyramid.stanford.edu/microarray/BladderTMA/.

KRT5 and KRT20 Immunohistochemistry. We have approval fromStanford University (IRB 1512) and Baylor College of Medicine(IRB H-26809) to use FFPE bladder cancer tissues from 1994 to2006 with at least 5-y complete clinical follow-up with survivaloutcome data. A total of 158 (Stanford) and 117 (Baylor) patientsfit with our selection criteria. PPFE sections (6 μm) were de-paraffinized and hydrated with graded ethanol. Antigen retrievalwas performed with 10 mM citrate buffer at pH 6.0, followed byprimary antibodies for KRT5 (Abcam; ab52635, mouse mono-clonal 1:100), and KRT20 (Abcam; ab76126, mouse monoclonal1:100) incubation, washing steps, and secondary antibody in-

cubation. Staining was performed using the Dako EnVision kit(K4006 and K4011) following the manufacturer’s protocol. Thesections were analyzed and scored by a trained pathologist. Thescoring criteria were determined as followed: 0, all negative; 1,<5% of cells positive; 2, 5–50% cells positive; 3, >50% of cellspositive. For analysis, 0 and 1 were stratified as negative and 2and 3 as positive.

KRT14 Western Blot Analysis. Tumor cell pellets were lysed in radioimmunoprecipitation assay buffer containing protease andphosphatase inhibitors for 10 min on ice, followed by centrifu-gation; supernatant was collected for protein quantification. Atotal of 40 μg of protein was running on a 4–12% gel and probedwith KRT14 (Covance; PRB-155P) and GAPDH antibodies aspreviously described.

Patient Classification for Outcome Analysis for KRT14 as a SingleMarker. We tested the hypothesis that KRT14 as a singlemarker is associated with patient survival using our Hegemonsoftware (Fig. 5). To divide the BC patients according to KRT14gene expression, we use K mean (k = 3) in two independent geneexpression datasets including the Lindgren and the Europeandatasets. This approach divides the BC patients into three dif-ferent groups: high, intermediate, and low. For the Europeandataset of 404 samples, the thresholds to define three differentexpression levels of KRT14 (ID: 209) are: high (KRT14 ≥ 1.37),intermediate (KRT14 ≥ 0.012; KRT14 < 1.37), and low (KRT14 <0.012). For the Lindgren dataset of 89 samples, the three differentlevels of KRT14 (ID: ILMN_1665035) are: high (KRT14 ≥ 0.98),intermediate (KRT14 ≥ 0.1; KRT14 < 0.98), and low (KRT14 <0.1). Multivariate analysis in the European dataset (largest BCdataset) was performed using both discrete (Fig. 5) and continuous(Table S1) gene-expression values for KRT14.We performed analysis on all muscle invasive tumors (Clinical

stage ≥pT2) in both datasets (Fig. S6 A and B): the Lindgren andthe European datasets. To divide all muscle invasive tumors(≥pT2) into two groups, we computed StepMiner threshold onKRT14 expression in tumors greater than pT2 stage includingpT2 stage. In both datasets, we used a stringent low (t − 0.5)threshold which is 0.5 lower that the StepMiner threshold. Allmuscle invasive BC patients (pT2+) were divided into twogroups using this threshold: KRT14 high (Lindgren, ID:21409 ≥1.24–0.5; European, ID:209 ≥ 1.5–0.5), and KRT14 low(Lindgren, ID:21409 < 1.24–0.5; European, ID:209 < 1.5–0.5).Kaplan–Meier, uni-, and multivariate analyses were performedon the Lindgren and the European datasets as shown in Fig. S6A and B.To evaluate KRT14 association with overall survival of patients

that have undergone bladder removal (cystectomy), we used theEuropean dataset (Fig. S6D). We divided all cystectomy patientsinto two groups: KRT14 high (ID:209 ≥ 0.726) and KRT14 low(ID:209 < 0.726). Kaplan–Meier, uni-, and multivariate analyseswere performed on this dataset as shown in Fig. S6D.Similar to the above late-stage tumors, we also performed

analysis on only pT2 muscle invasive tumors in both datasets (Fig.S6 E and F): the Lindgren and the European datasets. To divideall muscle invasive tumors (pT2) into two groups, we computedStepMiner threshold on KRT14 expression in only pT2 tumors.For both the Lindgren and the European datasets, we used thethreshold computed by StepMiner on only the pT2 patients. ThepT2 BC patients were divided into two groups: KRT14 high(Lindgren, ID:21409≥ 0.64; European, ID:209≥ 1.56), and KRT14low (Lindgren, ID:21409 < 0.64; European, ID:209 < 1.56).To evaluate KRT14 association with overall survival, pro-

gression-free survival, and recurrence-free survival in the earlystage tumors (pTa), we used European datasets (Fig. S7). Wedivided all early stage tumors (pTa) into two groups: KRT14 high(ID:209 ≥ 0.726–0.5) and KRT14 low (ID:209 < 0.726–0.5). In

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 3 of 15

Page 4: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

this case, KRT14 threshold (0.726–0.5) was computed as 0.5lower than the actual StepMiner threshold 0.726 on all 404 pa-tients to get more highly expressing KRT14 pTa tumors.To test the association of KRT14 protein expression with BC

patient outcome, two independent patient tissue datasets, theStanford and the Baylor datasets, were used (described above).We evaluated KRT14 association with overall survival of allpatients (Fig. 6) and patients with muscle-invasive BC (≥pT2)(Fig. S6C). Kaplan–Meier, uni-, and multivariate analyses wereperformed on this dataset as shown in Fig. 6. In addition, analysisfor the combination of KRT14, KRT5, and KRT20 was per-formed (Fig. S5).

Differentiation States Within Normal Urothelium. Using MiDReGanalysis that is based on the Boolean implication relationships asshown in Fig. S2A, we hypothesized the expression patterns ofthe markers of differentiation within normal urothelium asshown in Fig. S2B. Heatmaps of the upstream keratins (KRT14,KRT16, and KRT6) are shown in Fig. S2C using the normalbladder tissue (n = 48) expression patterns from the Sanchez-Carbayo dataset. KRT14 immunohistochemistry was performedon FFPE normal bladder urothelium (Fig. S2D) and immuno-fluorescence was performed on the fresh frozen OCT-embeddedfetal bladder tissue (Fig. S2E).

1. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI geneexpression and hybridization array data repository. Nucleic Acids Res 30:207–210.

2. Irizarry RA, et al. (2003) Summaries of Affymetrix GeneChip probe level data. NucleicAcids Res 31:e15.

3. Sahoo D, Dill DL, Tibshirani R, Plevritis SK (2007) Extracting binary signals frommicroarray time-course data. Nucleic Acids Res 35:3705–3712.

4. Sahoo D, Dill DL, Gentles AJ, Tibshirani R, Plevritis SK (2008) Boolean implicationnetworks derived from large scale, whole genome microarray datasets. Genome Biol9:R157.

5. Dyrskjøt L, et al. (2007) Gene expression signatures predict outcome in non-muscle-invasive bladder carcinoma: A multicenter validation study. Clin Cancer Res 13:3545–3551.

6. Kim WJ, et al. (2010) Predictive value of progression-related gene classifier in primarynon-muscle invasive bladder cancer. Mol Cancer 9:3.

7. Lindgren D, et al. (2010) Combined gene expression and genomic profiling define twointrinsic molecular subtypes of urothelial carcinoma and gene signatures formolecular grading and outcome. Cancer Res 70:3463–3472.

8. Sanchez-Carbayo M, Socci ND, Lozano J, Saint F, Cordon-Cardo C (2006) Definingmolecular profiles of poor outcome in patients with invasive bladder cancer usingoligonucleotide microarrays. J Clin Oncol 24:778–789.

9. Smyth GK, Speed T (2003) Normalization of cDNA microarray data. Methods 31:265–273.

10. Ritchie ME, et al. (2007) A comparison of background correction methods for two-colour microarrays. Bioinformatics 23:2700–2707.

11. Sahoo D, et al. (2010) MiDReG: A method of mining developmentally regulated genesusing Boolean implications. Proc Natl Acad Sci USA 107:5732–5737.

12. Chan KS, et al. (2009) Identification, molecular characterization, clinical prognosis,and therapeutic targeting of human bladder tumor-initiating cells. Proc Natl Acad SciUSA 106:14016–14021.

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 4 of 15

Page 5: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

K5

K14

K20

K14

K5

K16

K20

K16

K14 K16

B

K14 high => K5 high K14 high => K20 low K16 high => K5 high K16 high => K20 low

KX high => K20 low

Gen

e ex

pres

sion

Differentiation

K5 high => K20 lowKX high => K5 high

K5 K20KX

K5 high => K20 low

K5

K20

HumanC

K5

K20

K5 high => K20 lowMouse

D

K5

K20

K5 high => K20 lowRat

E

137 Genes 16 Keratins In BC: Dynamic range > 5/Stddev. >1.5 7 KeratinsF

Correlation of candidates for KX with patient survival (HR)

Name Affy ID Chungbuk ID HR p value

KRT14 209351_at ILMN_1665035 2.752625 0.01833 *

KRT6B 209126_x_at ILMN_1721354 3.481557 0.010818 *

KRT16 209800_at ILMN_1736760 2.176623 0.057495 .

KRT6A 214580_x_at ILMN_1754576 1.780939 0.429693

c

4 6 8 1 2

Value

020

0

Color Keyand Histogram

Cou

nt

K5K20K14K16K6AK6BK13K4

8 1 0 1 4

Value

080

0

Color Keyand Histogram

Cou

ntA

ffyB

C d

atas

etC

hung

buk

data

set

K5K20K14K16K6AK6BK6CK13K4

G

K5

K6A

K20

K6A

K6AK6A high => K5 ligh K6A high => K20 low

K5

K6B

K20

K6B

K6BK6B high => K5 high K6B high => K20 low

4 Keratins

Expression pattern & HR

A CD44 - K5 CD44 - K20 K5 - K20

Fig. S1. Computational prediction of changes in keratin expression and their relationship to differentiation states in bladder cancer. Keratin gene names areabbreviated as K. (B) K5 expression is mutually exclusive with K20 in normal urothelium and bladder cancer (BC), as revealed by our previously published data usingimmunofluorescence staining in tissue sections (A). (C–E) This relationship is consistent with the Boolean relationship K5high → K20low across multiple species(human, mouse, rat), and diverse tissue samples including 75,000 data points. (A) Also, on the basis of previously published BC biology, CD44 enriches for tumor-initiating cells in BC and CD44+ BC cells (Alexa 594/red) express KRT5 (Alexa 488/green), but not KRT20 (Alexa 488/green) (CD44+ cells are highlighted with white

Legend continued on following page

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 5 of 15

Page 6: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

dashed line), whereas downstream CD44− cells express KRT20, but not KRT5 (KRT5+ cells are highlighted with white dashed line), we hypothesize that K5+ cells areearly progenitors upstream of K20+ cells. (B) Here, we propose to predict upstream progenitor keratin KX using an algorithm, known as MiDReG (mining de-velopmentally regulated genes). This algorithm searches for genes that satisfy KXhigh→ K5high and KXhigh → K20low Boolean implication relationships. (F) A total of137 genes (16 keratins) satisfy these Boolean implication criteria that narrowed down to 7 keratins on the basis of dynamic range (>5) and SD (>1.5) in the AffyBCdataset. Heatmaps of these 7 keratins are shown in the AffyBC and 6 keratins in the Chungbuk dataset (keratin 6C is not present in the dataset). BC patient survivalassociation of the predicted upstream keratins (hazard ratio and P value) was computed using survival analysis of the KXhigh and KXlow patient groups in theChungbuk dataset. K13 and K4 have noisy gene expression patterns in the heatmap. (G) K14, K16, K6B, and K6A satisfy the required Boolean implication criteria.This analysis revealed that K14 is the best and the most consistent marker of differentiation according to the Boolean implication relationship (G), the expressionpatterns in the heatmaps, and the patient outcome (F).

Ge

ne

exp

ress

ion K5 K20K14K6/K16

Differentiation

K14

K16

K6A

K6B

K6C

2 6 10 14

Value

0

Color Keyand Histogram

Cou

nt

B C

DFetal urothelium

E

KRT14 immunohistochemistry KRT14 Hoechst

K5 high => K20 low

K5

K20

K5

K14

K20

K14

K14 high => K5 high K14 high => K20 low

K16

K14

K16 high => K14 low

K6

K14

A

Normal urothelium 200X100X200X 400X

K6 high => K14 high

Fig. S2. Differentiation states in normal urothelium. Keratin gene names are abbreviated as K. (A) Boolean relationship indicates that K5 expression is mutuallyexclusive with K20. K14 is coexpressed with K5, whereas K5 can be expressed without K14, and K14 expression is mutually exclusive to K20 expression. These geneexpression patterns indicate that K14 is developmentally upstream of K5. K16 and K6 are coexpressed with K14, whereas K14 can be expressed without K16 andK6. This indicates that K16 and K6 are developmentally upstream of K14. (B) Schematic illustrating of predicted keratin differentiation states in normal urothelium.(C) Heatmap showing the expression patterns of K14, K16, K6A, K6B, and K6C in normal urothelium in the Sanchez-Carbayo dataset (n = 41), and the distribution isconsistent with predicted differentiation states. (D) Immunohistochemistry shows basal localization of KRT14+ cells in adult urothelium (paraffin embedded). (E)Immunofluorescence staining shows intensive, predominantly basal staining for KRT14 in fetal urothelium (OCT-embedded frozen sections).

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 6 of 15

Page 7: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

201820_at: KRT5

2139

53_a

t: K

RT

20

209351_at: KRT14

2018

20_a

t: K

RT

5

ILMN_1665035: KRT14

ILM

N_1

8016

32: K

RT

5

ILMN_1801632: KRT5

ILM

N_1

7947

29: K

RT

20

Chungbuk dataset(Illumina platform / n=256)

AffyBC dataset(U133A platform / n=138)

1. Use StepMiner to compute threshold for K14, K5, K20

201820_at: KRT5

2139

53_a

t: K

RT

20

209351_at: KRT14

2018

20_a

t: K

RT

5

ILMN_1665035: KRT14

ILM

N_1

8016

32: K

RT

5

ILMN_1801632: KRT5

ILM

N_1

7947

29: K

RT

20

2. Identify Patient Groups +/- for K14, K5, K20

3. Intersect patient groups: K14+K5+K20- = Intersect (K14+, K5+, K20-) K14-K5+K20- = Intersect (K14-, K5+, K20-) K14-K5-K20+ = Intersect (K14-, K5-, K20+)

1. Use StepMiner to compute threshold 2. foreach gene X (K14, K5, K20) { Identify patient groups X+ and X- }3. Intersect patient groups: K14+K5+K20- = Intersect(K14+, K5+, K20-) K14-K5+K20- = Intersect(K14-, K5+, K20-) K14-K5-K20+ = Intersect(K14-, K5-, K20+)4. l1 = High(K14+K5+K20-)5. l2 = Down(K14+K5+K20-, K14-K5+K20-)6. l3 = Low(K14-K5-K20+)

7. l4 = Down(K14+K5+K20-, K14-K5-K20+)8. list1 = Intersect(l1, l2, l3)9. list2 = Intersect(l1, l2, l4) - list110. select cell surface markers from list1 and list211. rank genes by hazard ratios

High(group) { return genes (average in group > threshold) }Low(group) { return genes (average in group <= threshold) }Down(group 1 and 2) { return genes (average in group 1 > 2) }

Pseudocode to identify cell-surface markers using Hegemon (Hierarchical exploration of gene expression microarrays online)Input: developmental stages: basal K14+K5+K20-, intermediate K14-K5+K20-, differentiated K14-K5-K20+

7.5

8.0

8.5

CD90

208850_s_at

8.0

8.5

9.0

9.5

10.0

201656_at

CD49f

AffyBC datasets

9.8

10.2

10.6

11.0

CD90

ILMN_1779875

7.25

7.30

7.35

7.40

7.45

7.50

CD49f

ILMN_1687974

7.0

7.5

8.0

8.5

9.0

9.5 CD44

204490_s_at

11.0

11.5

12.0

12.5 CD44

ILMN_1732193

4-8, 10-11. Cell surface genes: Expression high in K14+K5+K20- & higher in K14+K5+K20- than K14-K5+K20- & low in K14-K5-K20+, ranked by Hazard Ratio (Chungbuk Dataset)

4-7,9,10-11. Cell surface genes: Expression high in K14+K5+K20- & higher in K14+K5+K20- than both K14-K5+K20- & K14-K5-K20+, ranked by Hazard Ratio (Chungbuk Dataset)

A

B

C

D

E

F

G

AffyBC datasets AffyBC datasetsChungbuk datasets Chungbuk datasets

Chungbuk datasets

K14+K5+K20-

K14-K5+K20-

K14-K5-K20+

Other

K14+K5+K20-

K14-K5+K20-

K14-K5-K20+

Other

K14 K14K20 K20K5 K5

K14+K5+K20- (n=17)

K14-K5+K20- (n=16)

K14-K5-K20+ (n=34)

Other (n=71)

K14+K5+K20- (n=8)

K14-K5+K20- (n=64)

K14-K5-K20+ (n=66)

Other (n=118)

Exp

ress

ion

gene

X

K14+K5+K20-

K14-K5+K20-

K14-K5-K20+

Differentiation

Threshold

Exp

ress

ion

gene

X

K14+K5+K20-

K14-K5+K20-

K14-K5-K20+

Differentiation

Threshold

Fig. S3. Identification of surface markers corresponding to the keratin differentiation states using Hegemon. (A) Pseudocode of the surface marker iden-tification that takes the input as the keratin differentiation states and return two lists (list 1 and list 2) of genes differentially expressed between the dif-ferentiation states. (B) The two different datasets used are the AffyBC dataset and the Chungbuk dataset in Affymetrix Human U133A and Illumina microarrayplatforms, respectively. (C) StepMiner is used to compute threshold [above threshold (red): high, below threshold (green): low] for each gene (probeset) foreach microarray platform. For Affymetrix Human U133A platform 45,563 publicly available microarrays are used to compute threshold for each probeset. ForIllumina platform only Chungbuk dataset (256 microarrays) is used to compute threshold. (D) Scatterplots of log2 normalized gene expression between KRT14,

Legend continued on following page

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 7 of 15

Page 8: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

KRT5, and KRT20 with the StepMiner thresholds in the AffyBC (n = 138) and the Chungbuk (n = 256) bladder cancer datasets. (E) K14+K5+K20− (red, AffyBC n =17, Chungbuk n = 8), K14−K5+K20− (blue, AffyBC n = 16, Chungbuk n = 64), K14−K5−K20+ (green, AffyBC n = 34, Chungbuk n = 66) patient groups selected andhighlighted in respective colors in the scatterplots. (F) Schematics of gene expression patterns (corresponds to list 1 of the pseudocode) that are high inK14+K5+K20− (red), down-regulated in K14−K5+K20− (blue), and low in K14−K5−K20+ (green) patient groups. (G) Schematics of gene expression patterns(corresponds to list 2 of the pseudocode) that are high in K14+K5+K20− (red), down-regulated in both K14−K5+K20− (blue) and K14−K5−K20+ (green) patientgroups but not all of the way to low in K14−K5−K20+ as in F. Boxplots with mean and confidence interval are shown for CD90, CD44, and CD49f for differentpatient groups in their respective colors (gray color indicates “other” patients). The detailed list of cell surface markers (both list 1 and list 2) are presented inDataset S1.

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 8 of 15

Page 9: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

Differentiation state of bladder cancer defined by keratins (K14+K5+K20- > K14-K5+K20- > K14-K5-K20+) isassociated with patient survival

Sorted Array Index

2140

9: K

RT

14

Sorted Array Index

1687

3: K

RT

20

Sorted Array Index

4006

: KR

T5

21409: KRT14

4006

: KR

T5

4006: KRT5

1687

3: K

RT

20

0.0

0.2

0.4

0.6

0.8

1.0

p=0.015

0 20 40 60Months

Can

cer

spec

ific

surv

ival

(1/10)

(2/12)

(3/4)

Others (20/63)

p=0.028 including Others

No. at risk

K14-K5-K20+K14-K5+K20-K14+K5+K20-Others

10124

63

891

49

341

30

1117

GroupsOther

Total 10 12 4 63

Tx 0 0 0 2

Ta 0 1 0 10

T1 6 6 0 19

T2 0 3 3 14

T3 4 1 1 16

T4 0 1 0 2

G2 0 2 0 17

G3 10 10 4 46

K14-K5-K20+

K14-K5+K20-

K14+K5+K20-

thr = 0.7

thr = 1.42

thr = 0.45

K5

high

K5

low

K20

hig

hK

20 lo

wK

14 h

igh

K14

low

Differentiation state of bladder cancer defined by surface markers (CD90+CD44+CD49f+ > CD90-CD44+CD49f+ >CD90-CD44-CD49f+) is associated with patient survival

Sorted Array Index

2182

2: C

D49

f

Sorted Array Index

2627

: CD

90

Sorted Array Index

6797

: CD

44

2627: CD90

6797

: CD

44

6797: CD44

2182

2: C

D49

f

No. at risk

CD90-CD44-CD49f-CD90-CD44-CD49f+CD90-CD44+CD49f+CD90+CD44+CD49f+Others

171017

738

158

132

29

73

112

15

12214

0.0

0.2

0.4

0.6

0.8

1.0

p=0.043

0 20 40 60

Months

Can

cer

spec

ific

surv

ival

(3/10)

(4/7)

(3/17)

Others (12/38)

(4/17)

p=0.063 including Others

GroupsOther

Total 17 10 17 7 38

Tx 1 0 0 0 1

Ta 1 2 6 0 2

T1 8 5 7 2 9

T2 3 2 1 4 10

T3 4 1 2 1 14

T4 0 0 1 0 2

G2 1 2 10 1 5

G3 16 8 7 6 33

CD90-CD44-CD49f-

CD90-CD44-CD49+

CD90-CD44+CD49f+

CD90+CD44+CD49f+thr = -0.59

thr = -1.38

thr = 0.47

CD

90 h

igh

CD

90 lo

wC

D44

hig

hC

D44

low

CD

49f h

igh

CD

49f l

ow

Groups1234567

K14+K5+K20+K14+K5+K20− K14+K5−K20−K14−K5+K20+K14−K5+K20−K14−K5−K20+K14−K5−K20−

(1/2)(3/4)(2/4)(1/1)

(2/12)(1/10)

(16/56)

0.0

0.2

0.4

0.6

0.8

1.0

p=0.010

0 20 40 60Months

Can

cer

spec

ific

surv

ival

Dead/Total

1

2

3

7

65

4

Association of differentiation states and block of differentiation defined by keratins with bladder cancerpatient survival

K14+K5+K20- K14-K5+K20- K14-K5-K20+

Groups12345678

CD90+CD44+CD49f+CD90+CD44+CD49f−CD90+CD44−CD49f−CD90+CD44−CD49f+CD90−CD44+CD49f+CD90−CD44+CD49f−CD90−CD44−CD49f+CD90−CD44−CD49f−

(4/7)(2/11)(5/15)(3/8)

(4/17)(2/4)

(3/10)(3/17)

0.0

0.2

0.4

0.6

0.8

1.0

p=0.153

0 20 40 60Months

Can

cer

spec

ific

surv

ival

Dead/Total

1

2

34

5

6

78

Association of differentiation states and block of differentiation defined by surface markers with bladdercancer patient survival

A

B

C

D

E

Fig. S4. Lindgren dataset (SWEGENE H_v3.0.1 35K platform). Differentiation state of BC defined by keratins (K14+K5+K20− → K14−K5+K20− → K14−K5−K20+)surface markers (CD90+CD44+CD49f+ → CD90−CD44+CD49f+ → CD90−CD44−CD49f+) is associated with patient survival. (A, Left) Identification of high and lowexpression values for each gene using StepMiner. Scatterplots of the selected patient groups (K14+K5+K20−, red; K14−K5+K20−, blue; K14−K5−K20+, green; andother, gray) according to the differentiation states. (Center) Kaplan–Meier survival curves of all patient groups. The ratio on the Right of Kaplan–Meier showsthe incidence proportions (1 – survival proportions). P values are computed on the basis of the log-rank test both including and excluding the “other” patientgroup. (Right) Clinical and pathological annotations for each patient group. The noise margin is 0.5 above and below (cyan lines) the StepMiner threshold (red

Legend continued on following page

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 9 of 15

Page 10: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

line), which corresponds to a twofold change in gene expression. For KRT14 and KRT5, a stringent low threshold (cyan line, 0.5 below red line) and for KRT20a stringent high threshold (cyan line, 0.5 above red line) is used to collect more patients in the basal (K14+K5+K20−, red) and the intermediate (K14−K5+K20−,blue) BC patient groups. (B, Left) Identification of high and low expression values for each gene using StepMiner. Scatterplots of the selected patient groups(CD90+CD44+CD49f+, red; CD90−CD44+CD49f+, blue; CD90−CD44−CD49f+, green; CD90−CD44−CD49f−, cyan; and other, gray) according to the differentiationstates. (Center) Kaplan–Meier survival curves of all patient groups. The ratio on the right of Kaplan–Meier shows the incidence proportions (1 – survivalproportions). P values are computed on the basis of the log-rank test both including and excluding the “other” patient group. (Right) Clinical and pathologicalannotations for each patient group. (C) Association of differentiation states and block of differentiation defined by keratins with bladder cancer patientsurvival: Kaplan–Meier survival curves of all patient groups defined by KRT14, KRT5, and KRT20 (including others, Fig. S3A). The ratio on the Right of Kaplan–Meier shows the incidence proportions (1 – survival proportions). P values are computed on the basis of the log-rank test. (D) Schematics of differentiationstates and block of differentiation in BC. (E) Association of differentiation states and block of differentiation defined by surface markers with bladder cancerpatient survival: Kaplan–Meier survival curves of all patient groups defined by CD90, CD44, and CD49f. Kaplan–Meier survival curves of all patient groups(including others, Fig. S3B). The ratio on the Right of Kaplan–Meier shows the incidence proportions (1 – survival proportions). P values are computed on thebasis of the log-rank test.

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 10 of 15

Page 11: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

Stanford Dataset

K14-K5-K20+ K14-K5+K20- K14+K5+K20-

Total 36 29

Non-invasive 13 16

Invasive 23 12

LG 9 14

HG 27 14

M 29 21

F 7 7

Avg Age 70.8 67.8

53

11

42

6

47

36

17

73.8

No. at risk

K14-K5-K20+K14-K5+K20-K14+K5+K20-

362953

282731

172123

171819

0.0

0.2

0.4

0.6

0.8

1.0

p=0.002

0 20 40 60Months

Ove

rall

surv

ival (8/36)

(26/53)

(6/29)

Baylor Dataset

No. at risk

K14-K5-K20+K14-K5+K20-K14+K5+K20-

161128

149

18

126

13

769

0.0

0.2

0.4

0.6

0.8

1.0

p=0.122

0 20 40 60Months

Ove

rall

surv

ival

(4/16)

(16/28)

(4/11)

A

B

Total 16 11 28

Ta 9 9 5

Tis 0 0 0

T1 6 0 2

T2 1 1 3

T3 0 1 12

T4 0 0 6

LG 9 9 3

HG 7 2 25

M 9 8 23

F 7 3 5

Avg Age 67.1 66.5

TUR-BT 15 10 8

Cystectomy 1 1 20

70.8

K14+K5+K20-K14-K5+K20-K14-K5-K20+

C

55kDA

BC specimen

K14 IHC negative

BC specimen

K14 IHC positive

BC specimen

K14 IHC positive

Fig. S5. Stanford and Baylor BC tissue dataset (KRT14/KRT5/KRT20 immunohistochemistry). Association of differentiation states (basal, intermediate, anddifferentiated) with patient survival in bladder cancer. In (A) Stanford and (B) Baylor tissue databases, patient tumors were stratified by immunohistochemicalanalysis of KRT14, KRT5, and KRT20 protein expression (positive/negative) into three groups: basal (K14+K5+K20−, red), intermediate (K14−K5+K20−, blue), anddifferentiated (K14−K5−K20+, green). Kaplan–Meier survival curves of the three patient groups and the P values based on the log-rank test are shown. The ratioon the Right of Kaplan–Meier shows the incidence proportions (1 – survival proportions). Clinical and pathological annotations for each patient group areshown on the Right. (C) Western blot analysis of keratin 14 (K14) protein expression in BC specimens, which are positive/negative for K14 in IHC analysis.

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 11 of 15

Page 12: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

European dataset (MDL Human 3K platform) Keratin 14 is associated withworse survival for patients with muscle invasive bladder cancer (>pT2)

K14 - K14 +

Total 32 21

T2 13 3

T3 16 12

T4 3 6

LG 0 1

HG 32 20

M 27 18

F 5 3

HR(lo-hi) p c HR(lo-hi) p c

K14 2.58(1.08-6.12) 0.032 * 2.35(0.85-6.50) 0.099

stage 1.09(0.59-2.02) 0.77 0.88(0.42-1.82) 0.73

grade 0.08(0.01-0.72) 0.024 0.10(0.01-1.32) 0.08

age 1.02(0.97-1.08) 0.37

*

1.02(0.96-1.08) 0.48

.

sex 1.13(0.66-1.96) 0.65 1.08(0.61-1.91) 0.80

.

Univariate Multivariate

Baylor dataset (K14 immunohistochemistry) Keratin 14 is associated with worsesurvival for patients with muscle invasive bladder cancer (>pT2)

No. at risk

K14 negativeK14 positive

3221

2914

2310

158

0.0

0.2

0.4

0.6

0.8

1.0

p=0.0267

0 20 40 60Months

Ove

rall

surv

ival

(9/32)

(12/21)

No. at risk

K14 lowK14 high

387

252

132

31

0.0

0.2

0.4

0.6

0.8

1.0

p=0.034

0 20 40 60Months

Can

cer

spec

ific

sur

viva

l

(15/38)

(5/7)

HR(lo-hi) p c HR(lo-hi) p c

K14 2.87(1.04-7.94) 0.043 * 3.12(1.11-8.79) 0.031 *stage 1.61(0.73-3.55) 0.23 2.05(0.85-4.98) 0.11

grade 0.49(0.23-1.02) 0.058 0.34(0.15-0.78) 0.011 . *

Univariate Multivariate

K14 low K14 high

Total 38 7

T2 16 4

T3 19 3

T4 3 0

G2 2 0

G3 36 7

Lindgren dataset (SWEGENE H_v3.0.1 35K platform) Keratin 14 is associatedwith worse survival for patients with muscle invasive bladder cancer (>pT2)

K14

hig

hK

14 lo

w

Sorted Array Index

2140

9: K

RT

14 thr = 0.74

Groups K14 low K14 high

Total 41 9

T2 37 7

T3 3 1

T4 1 1

G1-2 1 0

G3-4 40 7

M 35 7

F 4 2

Age 20 - 29 Years 0 0

Age 30 - 39 Years 0 0

Age 40 - 49 Years 2 0

Age 50 - 59 Years 8 1

Age 60 - 69 Years 8 2

Age 70 - 79 Years 15 4

Age 80 - 89 Years 5 2

Age 90+ Years 0 0

K14

hig

hK

14 lo

w

HR(lo-hi) p c HR(lo-hi) p cK14 3.21(1.37-7.51) 0.0072 ** 3.04(1.24-7.49) 0.015 *stage 1.49(0.89-2.50) 0.13 1.63(0.96-2.76) 0.07 .grade 1.12(0.38-3.29) 0.83 1.43(0.41-4.99) 0.57age 1.27(0.87-1.86) 0.22 1.28(0.86-1.89) 0.23sex 1.28(0.78-2.10) 0.33 1.25(0.74-2.12) 0.4

Univariate Multivariate

No. at risk

K14 lowK14 high

419

231

181

110

0.0

0.2

0.4

0.6

0.8

1.0

p=0.005

0 20 40 60

Months

Ove

rall

surv

ival

(20/41)

(8/9)

Sorted Array Index

209:

KR

T14 thr = 1.00

HR(lo-hi) p c HR(lo-hi) p c

K14 4.36(1.40-13.64) 0.011 * 4.09(1.12-14.95) 0.033 *

grade 1.43(0.34-6.05) 0.63 1.29(0.30-5.51) 0.73

age 1.12(0.75-1.68) 0.59 0.97(0.63 - 1.49) 0.89

sex 1.32(0.80-2.19) 0.28 1.17(0.69 - 1.98) 0.57

No. at risk

K14 lowK14 high

404

220

170

100

0.0

0.2

0.4

0.6

0.8

1.0

p=0.008

0 20 40 60Months

Ove

ral s

urvi

val

(20/40)

(4/4)

Groups K14 low K14 high

Total 40 4

G1-2 1 0

G3-4 38 4

Male 34 2

Female 4 2

20-29 years 0 0

30-39 years 0 0

40-49 years 2 0

50-59 years 7 0

60-69 years 9 0

70-79 years 14 2

89-89 years 5 2

90+ years 0 0

European dataset (MDL Human 3K platform) Keratin 14 is associated withworse survival for bladder cancer patients with pathological stage 2 (pT2)

Univariate Multivariate

K14

hig

hK

14 lo

w

Sorted Array Index

209:

KR

T14

thr = 1.56

HR(lo-hi) p c HR(lo-hi) p c

K14 8.98(1.93-41.66) 0.005 ** 16.63(2.69 - 102.79) 0.002 **grade 0.49(0.22-1.1) 0.086 . 0.30(0.11-0.81) 0.017 *

0.0

0.2

0.4

0.6

0.8

1.0

p=0.0008

0 20 40 60Months

Can

cer

spec

ific

surv

ival

(4/16)

(4/4)

No. at risk

K14 lowK14 high

164

120

80

10

Groups K14 low K14 high

Total 17 3

G2 2 0

G3 14 4

Lindgren dataset (SWEGENE H_v3.0.1 35K platform) Keratin 14 is associatedwith worse survival for bladder cancer patients with pathological stage 2 (pT2)

Univariate Multivariate

K14

hig

hK

14 lo

w

Sorted Array Index

2140

9: K

RT

14

thr = 0.64

European dataset (MDL Human 3K platform) Keratin 14 is associated withworse survival for bladder cancer patients undergone cystectomy

No. at risk

K14 lowK14 high

575

431

341

190

0.0

0.2

0.4

0.6

0.8

1.0

p=0.034

0 20 40 60Months

Ove

ral s

urvi

val

(19/57)

(3/5)

Groups K14 low K14 highTotal 57 5Ta 11 1T1 30 1Tis 0 0T2 14 2T3 2 1T4 0 0G1-2 5 0G3-4 51 3Male 46 3Female 10 220-29 years 1 030-39 years 1 040-49 years 5 050-59 years 15 260-69 years 12 270-79 years 17 189-89 years 1 090+ years 0 0

Sorted Array Index

209:

KR

T14

thr = 0.726

K14

hig

hK

14 lo

w

HR(lo-hi) p c HR(lo-hi) p cK14 3.52(1.01-12.28) 0.048 * 3.65(1.00-13.34) 0.05 .stage 1.10(0.89-1.37) 0.37 1.04(0.82-1.30) 0.76age 1.07(0.73-1.56) 0.73 1.11(0.74-1.64) 0.62sex 0.96(0.55-1.66) 0.87 0.90(0.51-1.58) 0.71grade 1.87(0.64-5.43) 0.25 1.93(0.63-5.93) 0.25

Univariate Multivariate

A

B

C

D

E

F

Fig. S6. Keratin 14 is associated with worse survival for bladder cancer patients with muscle-invasive bladder cancer and patients who have undergonecystectomy. (A–C) Patients with muscle-invasive BC stage ≥pT2: (A) KRT14 gene expression levels for bladder cancer patients are divided into two groups (high,red and low, green) in the European dataset. The threshold (red line) was computed by applying StepMiner on all 403 BC data. Kaplan–Meier survival curves ofboth patient groups and the P values based on the log-rank test are shown in the Center. Clinical and pathological annotations for each patient group areshown on the Right. The table at the Bottom shows uni- (Left) and multivariate (Right) results using Cox regression analysis (HR, hazard ratio; lo-hi, confidenceinterval; P, P value; C, significance status by number of asterisks). (B) KRT14 gene expression levels for bladder cancer patients are divided into two groups(high, red and low, green) in the Lindgren dataset. The threshold (red line) was computed by applying StepMiner on all 89 BC data. Kaplan–Meier survivalcurves of both patient groups and the P values based on the log-rank test are shown in the Center. Clinical and pathological annotations for each patient group

Legend continued on following page

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 12 of 15

Page 13: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

are shown on the Right. The table at the Bottom shows uni- (Left) and multivariate (Right) results using Cox regression analysis (same abbreviations as above.).The P values shown are based on the log-rank test. The ratio on the Right of Kaplan–Meier shows the incidence proportions (1 – survival proportions). (C) K14protein expression levels for BC patients are divided into two groups (positive, red and negative, green) in the Baylor tissue dataset. Kaplan–Meier survivalcurves of all of the patient groups and the P values based on the log-rank test are shown in the Center. Clinical and pathological annotations for each patientgroup are shown on the Right. The table at the Bottom shows uni- (Left) and multivariate (Right) results using Cox regression analysis (same abbreviations asabove). (D) Patients who have undergone cystectomy: KRT14 gene expression levels for bladder cancer patients who have undergone bladder removal (cys-tectomy) are divided into two groups (high, red and low, green) in the European dataset. The threshold (red line) was computed by applying StepMiner on all403 BC data. Kaplan–Meier survival curves of both patient groups and the P values based on the log-rank test are shown in the Center. Clinical and pathologicalannotations for each patient group are shown on the Right. The Bottom Right table shows uni- (Left) and multivariate (Right) results using Cox regressionanalysis (same abbreviations as above). The P values shown are based on the log-rank test. The ratio on the Right of Kaplan–Meier shows the incidenceproportions (1 – survival proportions). (E and F) Patients with muscle-invasive BC with stage pT2: KRT14 expression levels are divided into two groups (high, redand low, green) in two independent datasets (E, Lindgren; F, European) in two different microarray platforms (E, SWEGENE H_v3.0.1 35K; F, MDL Human 3K).The threshold (red line) was computed by applying StepMiner on only the muscle-invasive (pT2) BC patient data. Kaplan–Meier survival curves of the twopatient groups and the P values based on the log-rank test are shown in the Center. The ratio on the Right of Kaplan–Meier shows the incidence proportions(1 – survival proportions). Clinical and pathological annotations for each patient group are shown on the Right. The table at the Bottom shows uni- (Left) andmultivariate (Right) results using Cox regression analysis (same abbreviations as above).

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 13 of 15

Page 14: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

Sorted Array Index

209:

KR

T14 thr = 0.226

0.0

0.2

0.4

0.6

0.8

1.0

p=0.034

0 20 40 60Months

Pro

gres

sion

-fre

e su

rviv

al

(21/156)

(8/31)

No. at risk

K14 lowK14 high

15631

12819

10916

599

Groups K14 low K14 high

Total 156 31

G1-2 77 15

G3-4 52 12

Male 130 22

Female 26 9

20-29 years 2 0

30-39 years 4 0

40-49 years 9 2

50-59 years 36 4

60-69 years 38 12

70-79 years 48 12

89-89 years 17 1

90+ years 0 0

Progression-free survivalB

K14

hig

hK

14 lo

w

Sorted Array Index

209:

KR

T14

thr = 0.226

0.0

0.2

0.4

0.6

0.8

1.0

p=0.019

0 20 40 60Months

Rec

urre

nce-

free

sur

viva

l

(57/66)

(18/18)

No. at risk

K14 lowK14 high

6618

182

90

40

Groups K14 low K14 high

Total 66 18

G1-2 36 9

G3-4 23 6

Male 54 12

Female 12 6

20-29 years 1 0

30-39 years 2 0

40-49 years 5 1

50-59 years 12 1

60-69 years 20 8

70-79 years 22 7

89-89 years 4 1

90+ years 0 0

Recurrence-free survivalC

K14

hig

hK

14 lo

w

Sorted Array Index20

9: K

RT

14 thr = 0.226

HR(lo-hi) p c HR(lo-hi) p c K14 2.08(1.04-4.16) 0.039 * 2.03(1.01-4.07) 0.047 *grade 1.48(1.04-2.10) 0.028 * 1.48(1.04-2.10) 0.028 *age 1.38(1.03-1.84) 0.03 * 1.38(1.03-1.85) 0.033 *sex 0.89(0.58-1.38) 0.61 0.84(0.54-1.31) 0.44

0.0

0.2

0.4

0.6

0.8

1.0

p=0.035

0 20 40 60Months

Ove

ral s

urvi

val

(11/32)

(29/157)

No. at risk

K14 lowK14 high

15732

14227

12521

8013

Groups K14 low K14 high

Total 157 32

G1-2 77 15

G3-4 52 12

Male 131 23

Female 26 9

20-29 years 2 0

30-39 years 4 0

40-49 years 9 2

50-59 years 36 4

60-69 years 38 12

70-79 years 48 13

89-89 years 18 1

90+ years 0 0

Overall survivalA

K14

hig

hK

14 lo

w

Univariate Multivariate

Fig. S7. European dataset (MDL human 3K platform). Keratin 14 is associated with worse overall, progression-free, and recurrence-free survival for patientswith nonmuscle-invasive bladder cancer (pTa). KRT14 expression levels for the early stage (pTa) BC patients are divided into two groups in the Europeandataset: high (red) and low (green). (A, overall survival; B, progression-free survival; and C, recurrence-free survival) Kaplan–Meier survival curves of bothpatient groups and the P values based on the log-rank test are shown in the Center. The ratio on the Right of Kaplan–Meier shows the incidence proportions(1 – survival proportions). Clinical and pathological annotations for each patient group are shown on the Right. (A, overall survival) Uni- (Left) and multivariate(Right) results using Cox regression analysis is shown for the overall survival (see Fig. S4 legend for abbreviations). (A–C) Stringent low threshold (0.226, cyanline, 0.5 below red line) is used, which is 0.5 below the StepMiner threshold (red line, computed using all 403 microarrays).

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 14 of 15

Page 15: Supporting Information - PNAS...2012/01/19  · Supporting Information Volkmer et al. 10.1073/pnas.1120605109 SI Methods Data Collection and Processing. We downloaded 1.5 billion data

Table S1. European dataset (MDL human 3K platform)

Univariate Multivariate

HR (lo-hi) P C HR (lo-hi) P C

Analysis exclusive of intravesical treatment (mitomycin/bacillus Calmette–Guérin)KRT14 1.52 (1.18–1.98) 0.0015 ** 1.37 (1.07–1.76) 0.013 *Stage 1.37 (1.26–1.48) <0.0001 *** 1.3 (1.18–1.43) <0.0001 ***Grade 1.84 (1.42–2.39) <0.0001 *** 1.33 (1.01–1.75) 0.043 *Age 1.47 (1.24–1.73) <0.0001 *** 1.45 (1.23–1.71) <0.0001 ***Sex 1.01 (0.81–1.25) 0.96 1.01 (0.81–1.26) 0.94

Analysis inclusive of intravesical treatment (mitomycin/bacillus Calmette–Guérin)KRT14 1.64 (0.99–2.71) 0.056 . 1.75 (1.09–2.81) 0.02 *Stage 1.28 (1.11–1.480 0.00076 *** 1.05 (0.87–1.27) 0.59Grade 2.69 (1.38–5.24) 0.0037 ** 2.27 (1.14–4.50) 0.019 *Age 1.26 (0.93–1.70) 0.13 1.33 (0.99–1.79) 0.059 .Sex 0.94 (0.61–1.47) 0.79 0.78 (0.49–1.23) 0.28Intrav. mitomycin 1.07 (26–4.47) 0.93 0.78 (0.17–3.49) 0.74Intrav. bacillus

Calmette–Guérin3.01 (1.58–5.76) 0.00085 *** 2.59 (1.20–5.58) 0.015 *

Keratin 14 gene expression analyzed as continuous variable is associated with significantly worse patient overall survival. Multivariate analysis of KRT14 usingcontinuous KRT14 gene-expression values. Uni- (Left) and multivariate (Right) results using Cox regression analysis. HR, hazard ratio; lo-hi, confidence interval;P, P value; C, significance status by number of asterisks. (0 if P value = 0; *** if P > 0 & P ≤ 0.001; ** if P > 0.001 & P ≤ 0.01; * if P > 0.01 & P ≤ 0.05; . [period] if P >0.05 & P ≤ 0.1; [blank entry] if P > 0.1).

Dataset S1. Computationally predicted cell surface markers

Dataset S1 (XLS)

(A) Cell surface genes. Expression high in K14+K5+K20− and higher in K14+K5+K20− than K14−K5+K20− and low in K14−K5−K20+, ranked by hazard ratio(Chungbuk dataset). A list of cell surface genes (corresponding to the list 1 of the pseudocode in Fig. S2) whose gene expressions are high in basal BC(K14+K5+K20−), higher in the basal BC than the intermediate BC (K14−K5+K20−), and low in the differentiated BC (K14−K5−K20+), ranked by hazard ratio(Chungbuk dataset). The columns in the excel sheet are: HR, hazard ratio; lo-hi, confidence interval; P, P value; C, significance status by number of asterisks;name, gene name; AffyID, Affymetrix probeset ID; E1, gene expression difference between the basal BC and the differentiated BC in the AffyBC dataset; E2gene expression difference between the basal BC and the differentiated BC in the Chungbuk dataset; KimID, Illumina probesets in the Chungbuk dataset; Desc,description of the gene name. (B) Cell surface genes. Expression high in K14+K5+K20− and higher in K14+K5+K20− than both K14−K5+K20− and K14−K5−K20+,ranked by hazard ratio (Chungbuk dataset). A list of cell surface genes (corresponding to the list 2 of the pseudocode in Fig. S2) that are not present in DatasetS1, whose gene expression are high in basal BC (K14+K5+K20−), higher in the basal BC than both the intermediate BC (K14−K5+K20−), and the differentiated BC(K14−K5−K20+), ranked by hazard ratio (Chungbuk dataset). The columns in the excel sheet are: HR, hazard ratio; lo-hi, confidence interval; P, P value; C,significance status by number of asterisks; name, gene name; AffyID, Affymetrix probeset ID; E1, gene expression difference between the basal BC and thedifferentiated BC in the AffyBC dataset; E2 gene expression difference between the basal BC and the differentiated BC in the Chungbuk dataset; KimID,Illumina probesets in the Chungbuk dataset; Desc, description of the gene name.

Volkmer et al. www.pnas.org/cgi/content/short/1120605109 15 of 15