* these authors contributed equally to this work † … · 2020. 9. 17. · * these authors...
TRANSCRIPT
TITLE:
Diversity Across the Pancreatic Ductal Adenocarcinoma Disease Spectrum Revealed by Network-
Anchored Functional Genomics
AUTHORS:
Johnathon L. Rose1,2,*, Sanjana Srinivasan1,2,*, Wantong Yao3, Sahil Seth2,4, Michael Peoples4, Annette
Machado4, Chieh-Yuan Li1,2, I-Lin Ho1,2, Jaewon J. Lee3,5,6, Paola A. Guerrero3,5, Eiru Kim7, Mustafa
Syed8, Joseph R. Daniele4, Angela Deem9, Michael Kim6, Christopher A. Bristow4, Eugene J. Koay8,
Giannicola Genovese10, Andrea Viale1, Timothy P. Heffernan4, Anirban Maitra3,5, Traver Hart7,11,
Alessandro Carugo4,†, and Giulio F. Draetta1,†
AFFILIATIONS:
1. Department of Genomic Medicine, UT MD Anderson Cancer Center, Houston, TX 77030, USA 2. The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical
Sciences; The University of Texas Health Science Center at Houston, Houston, TX 77030, USA 3. Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center,
Houston, TX 77030, USA 4. Translational Research to AdvanCe Therapeutics and Innovation in ONcology (TRACTION),
University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA 5. Sheikh Ahmed Center for Pancreatic Cancer Research, University of Texas MD Anderson Cancer
Center, Houston, TX 77030, USA 6. Department of Surgical Oncology, University of Texas MD Anderson Cancer Center, Houston, TX
77030, USA 7. Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson
Cancer Center, Houston, TX 77030, USA 8. Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston,
TX 77030, USA 9. Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston,
TX 77030, USA 10. Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer
Center, Houston, TX 77030, USA 11. Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX
77030, USA
* These authors contributed equally to this work
† Correspondence: [email protected], [email protected]
ABSTRACT:
Cancers are highly complex ecosystems composed of molecularly distinct sub-populations of tumor cells,
each exhibiting a unique spectrum of genetic features and phenotypes, and embedded within a complex
organ context. To substantially improve clinical outcomes, there is a need to comprehensively define
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
inter- and intra-tumor phenotypic diversity, as well as to understand the genetic dependencies that
underlie discrete molecular subpopulations. To this end, we integrated CRISPR-based co-dependency
annotations with a tissue-specific co-expression network developed from patient-derived models to
establish CoDEX, a framework to quantitatively associate gene-cluster patterns with genetic
vulnerabilities in pancreatic ductal adenocarcinoma (PDAC). Using CoDEX, we defined multiple
prominent anticorrelated gene-cluster signatures and specific pathway dependencies, both across
genetically distinct PDAC models and intratumorally at the single-cell level. Of these, one differential
signature recapitulated the characteristics of classical and basal-like PDAC molecular subtypes on a
continuous scale. Anchoring genetic dependencies identified through functional genomics within the
gene-cluster signature defined fundamental vulnerabilities associated with transcriptomic signatures of
PDAC subtypes. Subtype-associated dependencies were validated by feature-barcoded CRISPR
knockout of prioritized basal-like-associated genetic vulnerabilities (SMAD4, ILK, and ZEB1) followed by
scRNAseq in multiple PDAC models. Silencing of these genes resulted in a significant and directional
clonal shift toward the classical-like signature of more indolent tumors. These results validate CoDEX as a
novel, quantitative approach to identify specific genetic dependencies within defined molecular contexts
that may guide clinical positioning of targeted therapeutics. (Word count: 231)
INTRODUCTION:
Pancreatic ductal adenocarcinoma (PDAC) is the third leading cause of cancer-related death in the
United States, with a 5-year survival rate of 8% and a median survival of <11 months1,2. Despite continual
attempts to manage this disease with targeted drugs and immunotherapy, a vast majority of PDAC tumors
are recalcitrant to therapeutic interventions. This is partially due to the dominance of KRAS gain-of-
function mutations (90% of tumors), and frequent loss-of-function genetic alterations in the well-known
epithelial tumor suppressors, TP53 (64%), SMAD4 (23%), and CDKN2A (17%), which make this disease
resistant to RTK pathways inhibitors, inducers of apoptosis, and other drugs. Beyond these dominant
genetic alterations, mutation-based diversity in PDAC consists of other mutations present at a
significantly lower frequency (<10%)3,4. This low mutation load is likely one factor that limits the efficacy of
immunotherapy, as checkpoint inhibitor monotherapies have shown little success in unselected patients5.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
The fundamental functional characteristics, or “hallmarks”, of tumor cells, predict that interfering with
functions such as cell division, cell motility, cell energetics, etc., should profoundly inhibit tumor growth
and progression. Yet, redundancy in essential pathways and adaptation often result in selective
enrichment of drug-resistant tumor cells and disease progression and relapse. Recent efforts in PDAC
molecular subtyping have better characterized disease heterogeneity using transcriptomic signatures
associated with clinical features, which can be used to define multiple PDAC subtypes6-9. Of these, the
most widely accepted classification distinguishes between “classical” and “basal-like” tumor subtypes7.
While these subtypes possess prognostic relevance, with basal-like tumors exhibiting the poorer
prognosis, their utility to predict patient response to specific targeted therapeutics has not yet been
realized. Moreover, clonal and sub-clonal evolution as well as therapeutic intervention can result in
molecular signatures that change the original subtype classification, highlighting the relevance of
intratumoral molecular and functional heterogeneity that underlies high-level subtype groupings10,11.
We have developed a systematic approach to correlate the status of pathway activation in a tumor with its
response to genetic suppression of individual gene functions. To associate genetic drivers with clinically
predictive transcriptomic signatures, we developed CoDEX, a dedicated PDAC co-expression network to
anchor dependencies within annotated cluster signatures. CoDEX represents the integration of CRISPR-
based co-dependency annotation with a tumor-specific co-expression network that we derived from a
curated cohort of patient-derived xenografts (PDXs), and it allows us to quantitatively delineate context-
specific vulnerabilities within high-resolution maps of transcriptomic diversity.
To develop CoDEX, we first refined and annotated the PDX PDAC co-expression network into 31
biologically defined gene clusters. Then, using prominent anticorrelated cluster signatures derived from
the co-expression network, we defined PDAC classical and basal-like subtypes on a continuous scale
and quantified the transitory nature of the transcriptional signature underlying these molecular subtypes,
which defined a group that exhibited a “quasi-basal” signature. We captured the subtype-associated
signatures, observed in PDX models, across TCGA tumor samples, and also at the single-cell level in
sequenced PDX-derived cell lines and human PDAC primary tumor samples. Finally, leveraging our
previously optimized in vivo genetic interference screening platform12,13, we used a customized CRISPR
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
library to expand on interacting nodes identified through a protein-protein interaction network, which
identified a spectrum of interconnected dependencies characteristic of each PDX model tested.
CoDEX is a novel approach to establish context specificity of individual gene targets within PDAC
molecular subtypes. This bears translational significance for several reasons. First, gene centrality serves
as a quantitative method to prioritize cluster-representative genes, with centrality and direct co-expression
jointly defining ideal biomarkers for subtype-specific dependencies. Second, by anchoring dependencies
identified through CRISPR screening within the co-expression network, we are able to uncover anti-
correlative cluster signatures that inform on the molecular context underlying each molecular subtype.
Fundamental dependencies associated with the basal-like subtype were validated intratumorally using
feature barcoding and single-cell RNA sequencing (scRNAseq), which directly supported the functional
relevance of specific genetic perturbations both at the level of sub-clonal composition and network-
defined transcriptomic signatures. Third, CoDEX annotation of bulk tumors uncovers an opportunity to
selectively eradicate dominant sub-populations and explore transcriptomic heterogeneity as a metric to
characterize tumor response to perturbation, as opposed to tumor size alone.
In sum, this work describes the application of a novel, quantitative methodology to characterize and
stratify genetic targets on transcriptomic signature patterns, which represents an important advancement
toward the development of subtype-specific targeted therapies in PDAC.
RESULTS:
Defining Transcriptomic Diversity in PDAC through the Construction and Annotation of a PDX-
based Co-Expression Network
To assess the diversity of transcriptomic signatures among PDAC tumors, we selected early passage
patient-derived PDAC xenografts, which maintain the cellular heterogeneity of tumor lesions while
reducing the contribution of the stromal components prevalent in these tumors14. We performed whole-
transcriptome sequencing on a set of 48 PDAC PDX tumors curated at MD Anderson (Figure 1A). Within
this cohort, we recognized a level of transcriptional diversity consistent with the previously defined Moffitt
classification status and distribution of classical and basal-like pancreatic adenocarcinoma (Extended
Data Figure 1A). In addition, we conducted whole-exome sequencing, which confirmed mutation
frequencies comparable to those reported by the TCGA4 (Figure 1B).
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
To characterize our cohort, we calculated global pairwise Pearson correlations of genes with variable
expression to quantify concordant patterns of transcriptomic diversity across our models. These
correlations were further pruned to prioritize biologically relevant gene pairs15, and from the resulting
103,000 correlations of 7,828 genes, we established a PDAC co-expression network (Figure 1A and
Extended Data Figure 1B). Using InfoMap16, a community detection tool, we divided the co-expression
network into 31 clusters, which were further genome ontology (GO) annotated (Figure 1C). We measured
PDAC diversity by applying a dimensional reduction approach on our 31 defined clusters. Specifically, we
quantified the mean expression of each cluster, or centroid score, on a tumor-by-tumor basis, for all 48
PDX models in the PDAC cohort (Figure 1D). To determine if the mutational background of a model was
significantly associated with cluster enrichment across the PDX cohort, we applied UNCOVER17, a
method to identify complementary patterns of mutation enrichment across groups.
We observed general patterns of anti-correlative clusters across the PDAC PDX cohort that were
reflected in cluster positioning and subsequent cross-cluster connectivity. The most significant
anticorrelated signatures were identified in clusters predominantly localized to adjacent ends of the force-
directed layout (Figure 1D - F), and the non-overlap of these adjacent anti-correlated cluster trends
implicates multiple distinct molecular signaling contexts that represent the immense diversity across the
PDAC disease spectrum. The top anti-correlative signatures were quantified between two opposing
clusters: Cluster 1 vs. Cluster 23, respectively enriched for lipid metabolism vs. cell development; and
Cluster 2 vs. Cluster 13, respectively enriched for Golgi-vesicle transport vs. nuclear-transcribed mRNA
catabolism (Figure 1E - F). Additionally, the PDAC co-expression network also highlighted GO annotated
clusters critical for proliferating tumor cells; specifically, Cluster 15, cell cycle, and Cluster 16,
mitochondrial transport and organization (Figure 1C and Extended Data Figure 1D). These findings reveal
that, along with 31 unique gene clusters, two distinct anti-correlated cluster signatures contribute to PDAC
tumor diversity and have the potential to provide context for tumor cell-intrinsic vulnerabilities.
Interestingly, we identified a limited cluster-mutation association between ACVR2A, RREB1, and MARK2
mutations and tumors with significant enrichment in Cluster 21 (Extended Data Figure 1C). PDAC-
associated loss-of-function mutations in RREB1, which encodes a zinc finger transcription factor that
binds to RAS-responsive elements, have been reported in the TCGA PDAC cohort4. RREB1 is a positive
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
regulator of the zinc transporter, ZIP3, with loss-of-function playing a potential role in limiting zinc uptake
and shielding developing tumors from the cytotoxic effects of high cellular zinc concentrations18. RREB1
has also been described as a KRAS-regulated SMAD co-factor involved in driving the expression of
epithelial-to-mesenchymal (EMT) transcription factors19. While the significance of mutations in the serine-
threonine kinases, MARK2 and ACVR2A, is relatively poorly understood in PDAC, dysregulation in these
genes could implicate the regulation of epithelial polarity and downstream SMAD-associated signaling,
respectively20-22. Associating co-expressed and annotated gene clusters with these less frequent
mutations in PDAC could aid in illuminating the molecular signaling, and potential therapeutic avenues,
underlying these genomic alterations.
In vitro and in vivo screening of essential signaling pathways in PDAC
We selected four PDXs out of those used to establish the co-expression network based on the presence
of mutations representative of PDAC (Extended Figure 2B and 2C), and we employed early passage cell
lines (PDX lines) of these models for genetic screening. We applied a stepwise custom functional
genomics platform for screening Moffitt-defined classical (PATC69) and basal-like (PATC124, PATC53,
and PATC153) PDX lines in parallel in vivo and in vitro (Figure 2A and Extended Data Figure 2A). Small,
customized lentiviral libraries were used to ensure maintenance of library complexity in vivo, based on
tumor-initiating cell frequency assessment12.
To perform in vivo and in vitro RNAi screens, we designed a surface proteins-targeting library (Figure 2A
and Extended Table 1A) based on evidence of differential expression in pancreatic tumors compared to
matched-normal tissue, copy number versus RNA expression correlation trends from the TCGA PDAC
dataset, and SILAC screening for mutant Kras dependency (Extended Data Table 1A)4,23-27. We used
redundant shRNA activity (RSA) to statistically score and identify candidate protein hits whose targeting
shRNA were selectively depleted in the screens (Extended Data Figure 2D - G). Through these initial
RNAi screens, we functionally annotated a wide range of PDAC-associated dependencies at the cell
surface and uncovered oncogenic signaling diversity across the PDX lines (Extended Data Figure 2).
We subsequently built a custom sgRNA library that re-annotated and expanded upon the RNAi-identified
protein surface dependencies (See Methods) across all of our PDX models (Figure 2A, Extended Data
Figure 2H). Using a CRISPR-based approach, the library incorporated sgRNAs targeting highly
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
connected proteins (see Methods) defined by a stringent STRING protein-protein interaction (PPI)
network (Figure 2A and Extended Data Table 1B)28 used to evaluate oncogenic signaling redundancies
and interpret co-dependencies. Three out of four available PDX lines (PATC69, PATC124 and PATC53)
were compatible with the CRISPR-based screening technology in vivo (Extended Data Figure 2B). We
conducted each in vitro CRISPR screen as a time course, with a matched in vivo endpoint (Extended
Data Figure 2A).
Data from the sgRNA screens were not amenable to analysis using current analytical frameworks (e.g.
MaGeCK29, JACKS30, CERES31, and BAGEL32,33), which are designed for genome-wide screens and are
not tailored for the small training sets of our custom library. To address this, we adapted the BAGEL
framework32,33 to create Low-Fat BAGEL, which is optimized to analyze small targeted sgRNA libraries
that are needed for in vivo screening or in other experimental settings where complexity may be a
limitation. The difference in performance between BAGEL and Low-Fat BAGEL in dealing with outliers
(Extended Data Figure 3A) is exemplified in the Bayes Factors (BF) (Extended Data Figure 3B and
Extended Data Figure 3C) and Precision-Recall curves for a particular screen in PATC69 PDX lines
(Extended Data Figure 3D). In comparison to other genome-wide analytical methods, Low-Fat BAGEL
analyses demonstrate better performance and a more accurate classification of essential and
nonessential genes in our screens (Extended Data Figure 3E). To ensure quality control, complexity
coverage for RNAi and CRISPR libraries was confirmed for each in vitro and in vivo screen (Extended
Data 4A - C). Control separation was confirmed in RNAi screens, and fold change separation and
precision-recall of the 50 essential and 50 non-essential control populations were confirmed for each
CRISPR screen using Low-Fat BAGEL (See Methods, Extended Data Figure 4D - I).
Quantile-normalized BFs (BF > 1) were directly compared to uncover essential genes32 represented in the
three models in vitro and in vivo (Figure 2B and Extended Data Figure 5A). Dependencies among the
three PDX lines, in both in vitro and in vivo conditions, were highly diverse. Ribosome-associated RACK1,
RPL30, and the MYC proto-oncogene were the only shared vulnerabilities identified in all in vivo and in
vitro screening contexts (Figure 2B and Extended Data Figure 5A). Quantile-normalized BFs also
highlighted varying degrees of essentiality for KRAS, with PATC124 growth exhibiting less dependence
on KRAS compared to PATC69 and PATC53 PDX lines (Figure 2C - E and Extended Data Figure 5B -
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
D), which suggests that oncogenic RTK buffering may contribute to signaling complexity even in the
context of mutated KRAS.
CoDEX platform identifies EMT-associated dependencies along prominent PDAC molecular
signatures
Next, we integrated quantile-normalized BFs with the STRING PPI network used to build the sgRNA
library to generate dependency networks for each model. Dependency networks were merged across all
three PDX lines based on overlapping essentiality thresholds (BF >1) for in vivo and in vitro contexts and
displayed using a force-directed layout to visualize connectivity between established and novel gene
targets (Figure 2C - E and Extended Data Figure 5B - D). The in vivo PATC53 dependency network
highlighted multiple unique groups of interconnected vulnerabilities when compared to PATC69 and
PATC124 dependency networks, despite PATC53 and PATC124 both being classified as basal-like,
based on the Moffitt signature (Extended Data Figure 2B). Notably, interconnected vulnerabilities in
PATC53 were associated with epithelial-to-mesenchymal transition (EMT) (e.g. SNAI1, ZEB1, SMAD4,
MAPK11 and MAPK14), integrin signaling (e.g. ILK, ITGB1, ITGB2 and NPNT), heparan sulfate
proteoglycan regulation (e.g. SDC3 and EXT1), cell junction regulation (e.g. CTTN, TJP1 and TJP2), and
intracellular signaling kinases (e.g. BCAR1, NCK1, CRKL and CRK) (Figure 2F - H, Extended Data Figure
5E - G).
To anchor these vulnerability trends within the landscape of PDAC diversity, we then integrated the
CRISPR screen-defined dependencies within our co-expression network. The integration displayed a
clear shift in the PATC53-associated dependency spectrum along the Cluster 1 -to- Cluster 23 axis, with
many of the dependencies identified in the CRISPR screen localized within, or adjacent to, Cluster 23
(Figure 1D, Figure 2F - H, Extended Data Figure 5E - G). Specifically, MAPK11 and FAM171A2 were
localized within Cluster 23 itself; NKAIN4 in Cluster 25; and SMAD4, ZEB1 and SDC3 in Cluster 31
(Figure 1D, Figure 2F - H, Extended Data Figure 5E - G). The network localization of functionally
annotated and PATC53-specific ZEB1, an EMT-associated transcription factor, and SMAD4, an EMT
facilitator and oncogenic driver in advanced PDAC, implicated a connection between the Cluster 1-to-23
(C1vC23) axis, and its associated dependencies, with PDAC epithelial-to-mesenchymal
transdifferentiation34,35.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Prominent anticorrelated cluster signatures outline a continuous classical-to-basal differential
signature
The distinct gene clusters we identified in the PDAC co-expression network provide a means to deeply
characterize the diversity of any PDAC model by considering correlations in disease-specific cluster
enrichment patterns, a parallel strategy to current approaches that use consensus clustering to assign a
tumor subtype based on refined PDAC-specific gene sets. Thus, this refined co-expression network can
comprehensively quantify global transcriptomic signaling trends on a tumor-by-tumor basis. Upon
observing the clear dependency shift along Cluster 23 in the basal-like PATC53 model (Figure 2H), we
noted a strong anti-correlative trend between Cluster 23 and Cluster 1 centroid scores among the entire
PDAC PDX cohort (Figure 1E). The Cluster 1 centroid scores across our PDX cohort showed low
variance among Moffitt-defined classical models, but a wider range and significant depletion of Cluster 1
gene expression (p = 3.67 x 10-6) was observed in basal-like models (Figure 3A). Interestingly, Cluster 1
unbiasedly localized the entire set of 21 classical signature genes from the Moffitt classification. Together,
these findings suggested that we could leverage the anti-correlated C1vC23 axis as a classical to basal-
like differential signature. Indeed, by applying K means clustering on only the Cluster 1 and Cluster 23
centroid scores, we separated the PDX cohort into three groups (Figure 3B): 1) enrichment in Cluster 1
represented Moffitt-defined classical models, 2) enrichment in Cluster 23 represented Moffitt-defined
basal-like models, and 3) models with partial enrichment of both Clusters 1 and 23 (Figures 3C - D),
which we termed “quasi-basal”. Thus, our C1vC23 signature provides a continuous transcriptomic
signature that expands on the original binary Moffitt classification and uncovers a transitionary quasi-
basal phenotype of PDAC with molecular signatures falling along a continuum.
To gain further insight into the molecular signatures driving PDAC subtypes, we conducted gene-set
enrichment analysis to compare CoDEX-defined classical vs. basal-like models within our PDX cohort.
This highlighted that EMT signaling was significantly enriched among PDX models that fell into the
CoDEX definition of basal-like, whereas this gene set was depleted in classical models (Figure 3E).
Enrichment of an EMT signature in basal-like PDX models supports the hypothesis that the basal-like
subtype is strongly associated with tumors where a majority of tumor cells has at least partly undergone
transdifferentiation towards a mesenchymal phenotype. Consistently, analysis of clinical and histological
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
data (See Methods) for the PDX cohort demonstrated that basal-like tumors were uniformly associated
with poor differentiation status and distant metastatic recurrence (Figure 3B). Next, using high-epithelial-
content PDAC patient data from TCGA (30% - 80% epithelial cells), we confirmed the presence of the
network-derived subtyping, again identifying a quasi-basal continuum. Further, we recapitulated the
association between tumor histology and the network-derived subtype in the TCGA dataset, wherein
more poorly differentiated tumors were classified as basal-like, with strong enrichment in Cluster 23 gene
expression (Figure 3F). By quantitatively characterizing a quasi-basal population, the CoDEX platform
enables a more precise definition of the clinically relevant basal-like tumor cohort while also expanding on
associated molecular dependencies that may be considered for therapeutic intervention7. Moreover,
CoDEX defines a broader, cluster-level characterization of the range of molecular signaling contributing to
diversity in PDAC, which can be used to granularly ascertain pathways that may be therapeutically
targeted to exert anti-tumor effects across this classical, quasi-basal and basal-like tumor spectrum.
Single-cell transcriptomic profiles of tumors define a cell-intrinsic clonal signature of PDAC
subtypes
To investigate whether the quasi-basal signature identified in bulk tumor populations represents a
quantifiable cell state or a mean signature derived from competing subcellular populations, we conducted
single-cell RNAseq (scRNAseq) on the PATC69 (quasi-basal), PATC124 (quasi-basal), and PATC53
(basal-like) PDX lines. The co-expression network was used to identify cells expressing more than 30% of
any cluster, and a centroid score for that cluster was calculated using genes with expression highly
correlated to the cluster enrichment (r > 0.4). By analyzing scRNAseq data in the context of the co-
expression network, we circumvented the technical issue of signal dropout by prioritizing large gene
clusters to represent transcriptional diversity rather than single gene expression. Thus, the co-expression
network serves as an additional resource for disease-specific single cell analysis. This approach
successfully confirmed the presence of a quasi-basal C1vC23 signature as a quantifiable state in
individual cells within in each PDX model (Figure 4A - D), confirming that the bulk readout represented an
average of the intratumoral spectrum of classical, quasi-basal, and basal-like subclonal populations,
rather than competing classical and basal-like signatures.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
To evaluate the utility of the network-defined tumor subtypes in a clinical setting, in which small numbers
of cells and stromal components may influence molecular analyses, we evaluated seven patient core
needle biopsies (CNBs; four primary tumors, as well as one each of liver, lung, and vaginal metastases)
using the network-aided scRNAseq analysis36 (Figure 4E, Extended Data Figure 6A). We first defined the
distribution of cell types amongst our samples (Figure 4F) using previously annotated marker genes
(Extended Data Figure 6B) and characterized cluster representation across the tumor microenvironment
by quantifying the mean percentage of genes of each cluster for each cell type. This identified the
epithelial component of the tumor as the primary contributor to the co-expression network cluster
signatures (Figure 4G). Applying the same cluster centroid normalization method described above, the
majority of cells that met our quality control cutoff were epithelial cells, with very little representation from
two primary tumor samples (Primary 1 and Primary 2) that contained little to no epithelial content
(Extended Data Figure 6A). Finally, to determine where each cell type was represented on the classical to
basal-like continuum, we assessed the C1vC23 cluster differential. We found that the C1vC23
classification is largely present within epithelial cells, with Uniform Manifold Approximation and Projection
(UMAP) clusters of fibroblasts and endothelial cells only representing a minority of misclassified C1vC23
signatures within the representative multiregional tumor microenvironment (Figure 4H). This finding
confirms a lack of sample purity bias in the characterization of C1vC23 signatures in the TCGA PDAC
samples (outlined in Figure 3F) and supports the feasibility of applying network-based cluster
characterization to bulk clinical samples, including CNBs.
CoDEX platform prioritizes subtype-associated biomarkers
Our PDAC-specific co-expression network allows us to leverage cluster centrality to associate cluster-
representative biomarkers to signatures, and the CRISPR screens uncovered first-degree nodes of
specific dependencies that can also serve as significantly co-expressed biomarkers. By jointly applying
centrality and defining hit-specific nodes to prioritize biomarkers, we capture both broader, disease-
specific cluster patterns as well as the expression of individual genetic vulnerabilities of potential interest.
Accordingly, we used multiplex immunofluorescence (IF) to independently recapitulate the subtype-
defining C1vC23 transcriptomic signature at the protein level. This method provides an avenue to
quantitatively prioritize potentially useful target-associated biomarkers while also serving as a protein-
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
level validation of this transcriptomic signature. As a proof of concept, we first determined closeness
centrality for all genes in both Cluster 1 and Cluster 23 (Figure 5A). We then defined the first-degree
nodes of two in vivo targets, DNM2 and MAPK11, which localized to Cluster 1 and 23, respectively
(Figure 5B - C). Among the first-degree nodes, we then prioritized genes in each gene set based on
closeness centrality and antibody availability, which resulted in selection of VSIG1 and VIM as
representative biomarkers of Cluster 1 and 23, respectively (Figure 5D). IF analysis and image-based
quantification of VSIG1 and VIM, when combined with double staining for HLA, successfully recapitulated
the expected C1vC23 signature at the protein level in PDX models with the representative classical
(VSIG+), quasi-basal (VIM-/VSIG-), and basal-like (VIM+) transcriptomic signatures (Figure 5E, F). These
findings demonstrate the utility of jointly leveraging cluster-centrality and defined gene targets to
quantitatively prioritize protein biomarkers associated with relevant transcriptional profiles. Moreover,
these biomarkers provide protein-level validation of transcriptomic cluster trends and serve as accessible
alternatives for clinical characterization.
CoDEX-informed genetic dependencies functionally validate intratumoral classical, quasi-basal,
and basal-like molecular signatures
By integrating CRISPR screen-defined co-dependencies with the co-expression network, the CoDEX
platform can identify both common and unique dependencies within the larger context of PDAC diversity.
To determine the functional relevance of this approach, we evaluated the effect of perturbing gene targets
associated with the basal-like subtype that were prioritized through our CoDEX analysis in varied tumor
contexts (Figures 2F - H, Figure 6A). Based on our in vitro CRISPR screening results, we selected
sgRNA sequences targeting SMAD4, ZEB1, and ILK, as well as the non-essential gene, ABCG8, as a
negative control (Figure 6B, Extended Data Figure 6A). CoDEX-informed SMAD4 and ZEB1, both
localized within the network, were targeted for C1vC23 signature validation. ILK, a CRISPR-defined
dependency in PATC53 not present in the co-expression network, was selected to determine whether
knockout of this potential EMT regulator would also have the capacity to influence the C1vC23 signature
differential37-39. In addition, we applied a feature barcoding strategy whereby a complement sequence
was incorporated into the 3’ end of the sgRNA sequences, enabling scRNAseq sample multiplexing and
quantification of the C1vC23 signature shift relative to the sgABCG8 negative control distribution (Figure
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
6B). Individual sgRNAs derived from the CRISPR library were transduced into quasi-basal PATC69 and
basal-like PATC53 cells (Extended Figure 7A). Cells were cultured cells in vitro and collected at the
earliest point when separation of essential versus non-essential genes was observed in the original
CRISPR screen (Day 20) (Figure 6B and Extended Data Figure 4D - H). Sanger sequencing was used to
analyze the indel frequency of each sgRNA (Extended Data Figure 7B - K), and colony growth was
tracked for each sgRNA to confirm selective growth inhibition in the basal-like PATC53. (Figure 6B).
Multiplexed scRNAseq was conducted on 10,000 cells total (2,500 cells per sgRNA) for each PDX line
model.
PATC53 UMAPs revealed a clear separation between the ABCG8 negative control knockout cells and
populations with perturbations in combined Cluster 23 (SMAD4 and ZEB1) and basal-like (ILK)-
associated genes, whereas no separation was observed between test genes and the negative control in
PATC69 cells (Figure 6C, E). Also, as expected, the PATC69 population transduced with negative control
sgABCG8 contained C1vC23-defined classical and quasi-basal cells, while the basal-like PATC53
population contained quasi-basal and basal-like cells (Extended Data Figure 8C - D). To test the C1vC23
signature distribution shift relative to the reference distribution defined by the ABCG8-null negative
control, C1vC23 density plots were generated for each sgRNA knockout cell line (Figure 6G - H), and
Kolmogorov’s D statistic was used. Deletion of ILK, SMAD4, and ZEB1 each resulted in a significant
C1vC23 shift towards the Cluster 1-enriched classical signature in both PATC69 and PATC53 (Figure 6G
- H). In parallel with scRNAseq, transduced populations were seeded in vitro immediately following
selection to compare relative growth phenotypes for both PDX lines (Figure 6B). Coinciding with the
signature shift towards Cluster 1 at the single-cell level, perturbation of each of the three basal-like-
associated genes inhibited bulk population growth in PATC53 relative to sgABCG8-knockout controls,
whereas no growth phenotype was observed in the classical PATC69 (Figure 6D, F and Extended Data
Figure 8A, B).
In summary, we provide functional validation of the C1vC23 signature based on concordant signature
shifting, in both quasi-basal PATC69 and basal-like PATC53 cell populations, following genetic
perturbation of basal-like associated targets. Importantly, these findings provide evidence for our CoDEX
platform to uncover the intratumoral context of vulnerabilities localized within and adjacent to Cluster 23,
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
by quantifying the targeted depletion of C1vC23-defined basal-like cell populations. This represents a
novel approach to leverage network-informed signaling perturbation to influence heterogeneity in
pancreatic tumors.
DISCUSSION:
PDAC characterization and subtyping centered on the categorical Moffitt classical and basal-like
molecular subtypes are able to stratify patients based on clinical outcome, but have not yet been able to
guide precision medicine7. We developed CoDEX as an unbiased platform to quantify the molecular
signatures that contribute to PDAC diversity and provide signaling context to functionally annotated
genetic dependencies. By applying the CoDEX platform to our cohort of PDAC PDX models and patient
samples, we identified a differential signature derived from anti-correlated Clusters 1 and 23, C1vC23,
which demonstrated a classical-basal continuum corresponding with EMT. GO annotations underlying the
identified clusters provided novel insight into the oncogenic signaling characteristics of the Moffitt-defined
classical and basal-like subtypes. Cluster 23, which defined basal-like subtypes, was characterized by
gene signatures associated with cell projection organization, cell motility, and general cell development.
Alternatively, Cluster 1, which defined classical tumors, exhibited a transcriptomic bias towards lipid
metabolism, hormone regulation, and other xenobiotic metabolism signaling genes. Lipid metabolism and
lipid droplets have previously been observed to play important roles in generating the fatty acids required
for pancreatic cancer cell proliferation40. Thus, linking de novo lipid synthesis to the classical tumor
spectrum provides key insight into potential context-specific metabolic dependencies in lower histological
grade PDAC40.
Application of the C1vC23 signature yielded a notable refinement in the Cluster 23-enriched basal-like
cohort that was defined by a significant enrichment in mesenchymal-associated clinical features and
genetic dependencies. This basal-like refinement was due to the quantification of a near-zero differential
in the C1vC23 signature at the bulk tumor and single-cell levels, which we defined as quasi-basal.
Importantly, this quasi-basal single-cell population was significantly represented within both bulk classical
and basal-like tumors, expanding on the Toronto classification approach that demonstrated co-
occurrence of classical and basal-like subtype heterogeneity at the single-cell level10,41.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
In addition to quantifying the transcriptomic diversity within PDAC, the CoDEX platform integrates
prioritized in vivo functional genomics on our PDX samples that results in a clinically relevant, context-
forward approach to contextualize the nature of genetic dependencies. By using PDXs, we overcome the
lack of Moffit-defined classical cell lines available for PDAC research7. scRNAseq analysis of
phenotypically and genetically diverse PDX models recapitulated findings from bulk tumors, revealing
vast intratumoral heterogeneity, and a cluster-defined classical to basal-like clonal spectrum with a
discrete quasi-basal signature in individual cells. Furthermore, scRNAseq of patient samples
recapitulated the C1vC23 intratumoral spectrum and revealed that the majority of the co-expression
network signatures was intrinsic to tumor cells.
We validated our C1vC23 signature by directly quantifying the effect of informed target perturbation on
C1vC23 subclonal heterogeneity through feature-barcoded scRNAseq. Multiplexed CRISPR deletion of
basal-like associated and cluster 23-anchored targets, ILK, SMAD4, and ZEB1 in knockout cells
quantified a “subclonal shift” toward the classical signature. This change in molecular dominance,
observed in both basal-like and quasi-basal populations, functionally demonstrated context-specific gene
essentiality as a function of disease evolution. Furthermore, knockout of established mesenchymal
drivers, such as SMAD4 and ZEB1, served to validate the C1vC23 signature. Consistent with previous
studies, our findings support a dual function of SMAD4 in PDAC, and the CoDEX-defined C1vC23
signature puts the tumor suppressor and tumor promotor roles of SMAD4 within distinct molecular
contexts35,42.
Our integrated CoDEX approach introduces multiple avenues for hypothesis generation within the context
of tumor heterogeneity. For example, anchoring of relatively less studied targets (e.g., FAM171A2 and
NKAIN4) to Cluster 23 implicates context-specific roles in mesenchymal cell proliferation and
maintenance. Although our results here focused on selective gene dependencies within highly aggressive
basal-like sub-clones, functional maps developed through the CoDEX pipeline also unraveled Cluster 1-
associated vulnerabilities, such as DNM2, with potential to eradicate classical clonal subpopulations.
Thus, while the quasi-basal population exhibits a near-zero differential in the C1vC23 signature,
associating other network-derived cluster signatures (e.g. C2vC13) may be useful to identify selective
genetic dependencies and further stratify this heterogeneous sub-population. This context-forward
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
approach to stratify dependencies provides a novel avenue to expand on current PDAC subtypes and to
define gene targets localized within additional cluster patterns underlying PDAC heterogeneity.
Notably, the subclonal shift induced by ILK knockout highlights the fact that additional metrics at the
protein-level are needed to more comprehensively quantify PDAC diversity, and underscores the need to
integrate protein status with network cluster trends to understand how molecular signaling is altered or
repositioned following major transcriptomic rewiring. Future integration of the co-expression network with
partially recapitulated or intact immune and stromal components may also elevate functional
characterization from the cell-intrinsic to tumor-intrinsic setting.
The use of CoDEX as an integrated approach to deconvolute tumor heterogeneity into relatively more
homogenous and approachable cluster-defined subclonal populations may, ideally, help to direct
treatment options to more significantly and comprehensively impact pancreatic tumors. This strategy may
also inform on the implementation of novel or existing drugs that may mitigate tumor heterogeneity, but
have only a negligible effect on tumor volume and viability. To our knowledge, CoDEX serves as the first
platform to define and leverage a comprehensive set of transcriptional signatures as a foundation to
inform context-specific dependency.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
METHODS:
PDX models and Sequencing (RNA and Whole Exome)
A total of 49 models were utilized in this paper. PDAC PDX models were obtained from the labs of Dr.
Michael Kim (Department of Surgical Oncology, MD Anderson Cancer Center) and Dr. Scott Lowe
(Memorial Sloan Kettering Cancer Center)43,44. PDXs were propagated and maintained in NOD scid
gamma (NSG) mice carrying NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (Jackson Labs).
PDX Sequencing
Whole exome library preparation and sequencing
Whole exome sequencing (WES) libraries were prepared using the Agilent SureSelect XT library
preparation kit in accordance with the manufacturer’s instructions. Briefly, DNA was sheared using a
Covaris LE220. DNA fragments were end-repaired, adenylated, ligated to Illumina sequencing adapters,
and amplified by PCR. Exome capture was performed using the Agilent SureSelect XT v4 51Mb capture
probe set and captured exome libraries were enriched by PCR. Final libraries were quantified using the
KAPA Library Quantification Kit (KAPA Biosystems), Qubit Fluorometer (Life Technologies) and Agilent
2100 BioAnalyzer and were sequenced on an Illumina HiSeq2500 sequencer using 2 x 125bp cycles.
Base calling and filtering were performed using current Illumina software and adapters were trimmed
using Trim Galore [55]. Sequences were aligned to both NCBI genome human build 37 and mouse build
38 using Burrows-Wheeler Aligner45; identified mouse reads were removed from the original FASTQs and
then the files were realigned again to NCBI build 37 using BWA. Picard was used to remove duplicate
reads (http://picard.sourceforge.net); base quality scores were recalibrated using GATK46. Assessment of
reads that do not align fully to the reference genome was performed, locally realigning around indels to
identify putative insertions or deletions in the region. Variants were called using GATK HaplotypeCaller,
which generates a single-sample GVCF. To improve variant call accuracy, multiple single-sample GVCF
files were jointly genotyped using GATK GenotypeGVCFs, which generates a multi-sample VCF. Variant
Quality Score Recalibration (VQSR) was performed on the multi-sample VCF, which adds quality metrics
to each variant that can be used in downstream variant filtering.
RNA library preparation and sequencing
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
RNA sequencing libraries were prepared using the Illumina TruSeq Stranded mRNA sample preparation
kit in accordance with the manufacturer’s instructions. Briefly, 100ng of total RNA was used for
purification and fragmentation of mRNA. Following conversion of mRNA to cDNA, DNA was adenylated,
ligated to Illumina sequencing adapters, and amplified by PCR (using 10 cycles). Final libraries were
quantified using the KAPA Library Quantification Kit (KAPA Biosystems), Qubit Fluorometer (Life
Technologies) and Agilent 2100 BioAnalyzer and were sequenced on an Illumina HiSeq2500 sequencer
(v4 chemistry) using 2 x 50bp cycles. Base calling and filtering were performed using current Illumina
software. Reads were aligned to a joint index of NCBI genome human build 37 and mouse build 38 with
STAR aligner47. Reads that map uniquely and unambiguously to the mouse genome were removed from
the FASTQ files and then the files (containing unmapped reads and reads mapped at least once to the
human reference) were remapped to GRCh37 using STAR aligner and Gencode 19 annotation. Gene
expression quantification was performed with featureCounts (http://bioinf.wehi.edu.au/featureCounts/).
Genes with raw read counts present in less than 20% of samples were removed from further analysis,
and log normalized counts were generated on the 14175 filtered genes using DESeq48.
Moffitt Classification
Moffitt classification of Classical and Basal was assigned to the 49 PDX models using log normalized and
scaled counts of 21 classical and 25 basal genes for PDX PDAC model classification described by
consensus clustering into two groups using the R package ConsensusClusterPlus49. The clusters were
manually assigned as classical and basal based on high expression of each group of genes.
Construction of the PDAC co-expression network
Gene-wise median absolute deviation across samples from the normalized read counts. 8505 genes with
median absolute deviation exceeding the 40th percentile was filtered to consider highly variable genes. A
comprehensive network of all 8505 was generated by calculating spearman correlation across all gene
pairs. In order to prune the network to prioritize biologically relevant gene pairs, a Bayesian framework
developed by Yang, et al, Log Likelihood Score (lls), was applied15. This paper curated a “positive gold
standard” list of biologically relevant gene pairs and “negative gold standard” gene pairs with no known
functional annotations. The likelihood of correlation between any given gene-pair being functionally
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
relevant is calculated comparing to the negative and positive gold standard gene pairs. A log likelihood
score of 2.5 was used to cut off gene pairs to be included in the network.
Clustering and GO pathway annotation
Infomap16, a community detection tool for large networks, was used to cluster the network. All gene pairs
were inputted into Infomap, and three hierarchical tiers of clusters were produced. In order to assign
clusters, if the third tier contained more than 50 genes, it was assigned to an individual cluster. In the
event that the third tier contained less than 50 genes, it was folded into the second tier, all of which
formed a cluster. This was the case with cluster 29-31, which are larger encompassing clusters. For the
resulting 31 clusters, the R package goseq50 was used to conduct hyper-enrichment analysis of Gene
Ontology Biological Processes pathways on each cluster. The R package “revigo”51 was used to prioritize
and visualize GO pathways to represent their hierarchical class.
Mutation analysis
The “oncoplot” feature within the R package maftools52 was utilized to visualize the mutational spectrum
across the genes identified as relevantly mutated by the PDAC TCGA paper. To identify complementary
patterns of mutations across clusters, we used a method called UNCOVER17 using a filtered set of “high
impact” mutations and the cluster enrichment centroid scores per model. The filtered set of mutations was
generated by identifying canonical mutations with gnomAD53 allele frequency less than 1%, “moderate”
and “high” impact, and limited to non-intronic/non-coding and synonymous mutations.
Centroid scores
To generate a cluster level enrichment score, we calculated a centroid score per cluster by taking the
mean log-normalized expression of all genes in each cluster for each sample. A comprehensive overview
of the 31 centroid scores across the PDX models was generated using the R package
ComplexHeatmap49. We further computed a Pearson correlation across all 31 clusters and identified
strongly anticorrelated signature between Clusters 1 and 23.
Network Display
Cytoscape (version 3.7.2) was leveraged for visualization of both the PDAC co-expression network and
STRING-anchored co-dependency networks54,55. The PDAC co-expression network was visualized using
a Prefuse Force-Directed Layout, with node color displayed based on Random Walk defined clusters and
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
node size representative of Betweenness Centrality. For visualization purposes, only nodes with 12 or
more edges are represented, and edges are not displayed in representative co-expression network
figures. For STRING-anchored co-dependency networks, a Perfuse Force-Directed Layout is also
applied, with quantile-normalized Bayes Factors represented as a blue -to- red color distribution. For each
in vivo and in vitro condition, networks were constructed for each independent PDX line and then merged
for comparison based on overlapping nodes. All STRING-defined edges were maintained for in vivo and
in vitro network merging.
Cell Culture
PDX cell lines were seeded in treated tissue-culture plates (Corning) in DMEM/F12 medium (Gibco)
supplemented with 10% Fetal Bovine Serum (Gibco), Penicillin (50 units/mL) and Streptomycin (50
µg/mL) (ThermoFischer Scientific). Phosphate Buffered Saline (PBS) was utilized prior to trypsinization
and for general cell washing purposes (Thermo Fisher Scientific). Cells were regularly trypsonized
(0.25%, Trypsin-EDTA, Gibco) prior to reaching 70% - 80% confluence, and maintained on 10 cm and 15
cm treated tissue-culture dishes (Corning). Viable cells were counted using a Cellometer mini and 0.2%
Trypan Blue staining (Nexcelom).
Design and Construction of Custom RNAi Library
The original RNAi custom library, constituted by 2,653 shRNAs targeting PDAC-prioritized proteins
associated to the extracellular face of the plasma membrane, was constructed using chip-based
oligonucleotide synthesis and cloned into a pRSI17cb-U6-sh-13kCB18-HTS6-UbiC-TagGFP2-2A-Puro
lentiviral vector (Cellecta) as a pool. PDAC-prioritized surface proteins (including proteins with evidence of
mislocalization) were selected based on tumor-specific overexpression compared to matched-normal
tissue, copy number vs. RNA expression correlation, and SILAC screening for KRAS dependency26.
Significant tumor-specific overexpression and copy number vs RNA expression were determined for the
full transcriptome, and subsequently refined using an internally curated list of select extracellular proteins
containing transmembrane domains, GPI-anchored proteins and proteins with evidence of membrane
mislocalization. Tumor-specific overexpression vs. matched normal tissue was defined based on two
matched-normal bulk tumor microarray datasets (FC ≥ 1.5, q < 0.005, ArrayExpress: E-GEOD-15471,
EG-GEOD-28735), and further refined for tumor-specific expression (FC ≥ 1.5) by using a microarray
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
dataset composed of micro-dissected samples (ArrayExpress: E-MEXP-1121/950)23-25,56,57. Positive
Spearman correlations (rho ≥ 0.35, q < 0.005) of copy number vs. RNA expression were calculated based
on available TCGA level 3 data derived from the 07/15/2014 dataset58. False discovery rate for bulk tumor
microarray datasets and for copy number vs. RNA expression correlations was calculated using the
“qvalue” Bioconductor package59. Genes derived from published data characterizing KRAS dependent
surface protein localization, conducted across three iKRAS p53L/+ mouse60 tumor-derived cell lines, were
also incorporated into the library based on previous work26. The shRNAs targeted 241 genes, with 10
shRNAs/gene. Targeting sequences were designed using a proprietary algorithm (Cellecta). The oligo
corresponding to each shRNA was synthesized with a unique molecular barcode (18 nucleotides) for
measuring representation by NGS. Negative controls consisted of shRNAs targeting Luciferase, and
positive controls consisted of shRNAs targeting RPL30 and PSMA1. In addition, we incorporated shRNAs
targeting KRAS into the library to serve as additional controls for the PDAC PDX lines.
RNAi Screening in vivo and in vitro
Using the custom barcodes lentiviral shRNA library, PDX lines (PATC69, PATC124, PATC53 and
PATC153) were transduced in vitro using 8 µg/mL Polybrene (Sigma-Aldrich). Libraries were transduced
at 1000X coverage and multiplicity of infection (MOI) of 0.3. Media was replaced after 14 hours, and at
48 hours, and MOI was confirmed by checking GFP percentage with flow-cytometry 48 hours post-
transduction. Immediately following MOI confirmation with flow cytometry, 2 µg/mL Puromycin (Thermo
Fisher) was added to each transduced cell population for 72 hours. Immediately following Puromycin
selection, a Reference population was collected (1000X) and stored at -80 ºC. Remaining cells were split
and moved into in vitro and in vivo settings. Three independent in vitro screens were seeded (1000X) into
15 cm treated plates (Corning). For in vivo implantation, cells were combined into 1:1 PBS/Growth-Factor
Reduced Matrigel (Corning), and injected orthotopically at 1000X coverage per mouse. Immunodeficient
NOD SCID mice were leveraged for in vivo screening. For in vitro screening populations, cell populations
were collected at 10, 20 and 30 days post Reference collection. For in vivo screening, the entire pancreas
and tumor of each mouse was collected at day 30. For each condition, cells were lysed using SDS and
DNA was sheared using sterile 23 gauge 1 inch needles (Becton Dickinson). Tumor DNA was isolated
using Phenol:Chloroform extraction and Ethanol precipitation. A Nested PCR strategy was utilized to
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
amplify and prepare barcode populations for next-generation sequencing (Cellecta). Redundant siRNA
activity (RSA) analysis was applied to rank targets for each PDX model and screen setting61.
Design and construction of custom CRISPR Library
The custom CRISPR-Cas9 sgRNA library was constituted by 3,367 sgRNAs and designed to both
incorporate and expand upon the original RNAi-defined targets. The library was constructed using chip-
based oligonucleotide synthesis and cloned into a pRSG16-U6-sg-HTS6C-UbiC-TagRFP-2A-Puro
lentiviral vector (Cellecta) as a pool. RSA was used to rank essential genes from shRNA screens. Genes
were curated based on RSA < 0.05 and FC < -2 in at least one model (N=100). Curated gene targets
were further annotated by incorporating neighbors with a PPI score ≥ 0.80 (STRING, version 10), and
TPM (transcripts per million) > 2 in that model47. In addition, 50 non-essential genes and 50 essential
genes were added to have a final set of 654 genes32.
CRISPR Cas9 Screening in vivo and in vitro
Using the custom barcodes lentiviral sgRNA library, PDX lines (PATC69, PATC124, PATC53 and
PATC153) containing the lentiCas9-blast vector (addgene, plasmid #52962) were transduced in vitro
using 8 µg/mL Polybrene (Sigma-Aldrich). PDX lines transduced with Cas9 were constantly kept at 10
µg/mL blasticidin (Thermo Fischer Scientific). Libraries were transduced at 1000X coverage and a
multiplicity of infection (MOI) of 0.3. Media was replaced after 14 hours, and at 48 hours, and MOI was
confirmed by checking RFP percentage with flow-cytometry 48 hours post-transduction. Immediately
following flow cytometry, 2 ug/mL puromycin (Thermo Fisher) was added to each transduced cell
population for 72 hours. Immediately following puromycin selection, a Reference population was collected
(1000X) and stored at -80 °C. Cells were allowed to culture for an additional 12 days in order to allow for
some initial sgRNA-directed cutting prior to starting the in vivo screen, to reduce potential noise derived
from disproportional cell doubling following orthotopic implantation. A secondary Reference (1000X), at
the time of injection, was then collected. Remaining cells were split and moved into in vitro and in vivo
settings. Three independent in vitro screens were seeded (1000X) into 15 cm treated plates (Corning).
For in vivo implantation, cells were combined into 1:1 PBS/Growth-Factor Reduced Matrigel (Corning),
and injected orthotopically at 1000X coverage per mouse. Immunodeficient NSG mice were leveraged for
in vivo screening. For in vitro screening populations, cell populations were collected at 10, 20 and 30 days
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
post injection and secondary Reference collection. For in vivo screening, the entire pancreas and tumor
of each mouse was collected at day 30 post injection. DNA extraction and barcode library preparation
were conducted in similar fashion to the RNAi screens. For each condition, cells were lysed using SDS
and DNA was sheared using sterile 23 gauge 1 inch needles (Becton Dickinson). DNA was isolated using
Phenol:Chloroform extraction and Ethanol precipitation. Nested PCR was utilized to amplify and prepare
barcode populations for NGS (Cellecta).
Low Fat BAGEL
Log2 fold-change was calculated on a guide level by comparing each time point to the reference time
point for each model. Screen quality and efficient drop out was assessed by comparing the log density
ratio log2 fold-change of core essential versus non-essential guides.
Low-Fat BAGEL is an adapted framework of the BAGELv233 algorithm to more accurately analyze small
screens with limited training sets. The BAGEL and BAGELv2 algorithms are previously described
implementations of a Bayesian model selection algorithm, and in short, calculate a “Bayes Factor”, a log
likelihood of gene essentiality trained on gold-standard core essential and non-essential genes32.
�� �Pr�� | ��� ����
Pr�� | ������ �����
� Pr�� |�, ��� ���� Pr�� | ��� ���� ��
� Pr�� |�, ������ ���� Pr�� | ������ ���� ��
In Low-Fat BAGEL, a random bootstrap of core essential and non-essential guides is used as the training
set to generate log density curves of the essential and non-essential fold-change distribution. This
distribution is used to generate a linear regression model (k) of the fold-change to control for the influence
of outliers. Over 100 permutations of the training set, Low-Fat BAGEL calculates the BF of all guides in
the library, by comparing the log2 fold-change of each guide (D) to the distribution of fold-changes of the
essential and non-essential genes. The final BF each of individual guide is computed as the mean across
100 iterations, and a gene level BF was calculated as the sum of all guides for that gene.
The R package ‘ROCR’ was used to generate Precision-Recall curves of the essential and non-essential
gene level BF62. The F measure of each screen using different analytical methods was calculated as
described here32.
Anticorrelated cluster signatures and “Quasi-basal” group
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
A test was performed to quantify the difference in Cluster 1 enrichment across Classical and Basal
models. The centroid scores for Cluster 1 and 23 were used to categorize the PDX models into three
clusters using K-means clustering. K=3 was chosen as the optimal number of clusters based on the
majority consensus of two out of three methods between the elbow criterion in within-cluster sum of
squares, BIC calculated by the R package “mclust”63 and “NbClust”64 which calculates the optimal k
across 30 methods. The R package “factoextra”65 was used to visualize the results. The group of models
with Cluster 1 enrichment and Cluster 23 depletion were termed “classical”, with the inverse group termed
“basal”, and the third group with roughly equal centroid scores in Cluster 1 and 23 termed “quasi-basal”.
Differential Expression between Classical and Basal Models
After categorizing models, the ones deemed classical and basal using the CEN methodology were used
to conduct differential expression analysis. DESeq48 was used to identify significantly differentially
expressed genes, and genes were ranked by t score. GSEA was conducted using the “GSEA pre-ranked”
feature66. We compared differences in histology and site of recurrence obtained from clinical data
associated with each PDX model. Chi square tests were conducted using the CEN subtypes and Moffitt
subtypes to evaluate if subtype level differences were statistically significant.
Clinical correlations with signature
Clinical data from patients corresponding to the PDAC PDX cohort was provided by the lab of Dr. Eugene
Koay67. Differentiation status on 45 models was categorized as “Poor” and “Moderate” and histology
status on 35 models was categorized as “Locoregional” if the records indicated as “regional” or
“locoregional”, and “Distant” if site was outside the pancreas. Chi square tests were used to evaluate
differences between the cluster-defined Classical, Quasi-basal and Basal groups as well as the Moffitt
binary classification.
TCGA data
TCGA gene expression data and corresponding clinical data was downloaded from GDC, and processed
as previously described4. Raw read counts were log2 transformed and normalized using DESeq48
similarly to the PDXs described above. We filtered the dataset to “high purity” samples annotated in the
clinical data, where we obtained the Moffitt classification data. Cluster centroids were calculated as
described above with the PDXs and a k means of 3 was applied to the Cluster 1 and 23 centroid scores
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
to categorize tumors into Classical, Quasi-basal and Basal. Tumor grade and differentiation status were
obtained from the clinical data and Chi square tests were conducted to assess subtype level associations
with the C1vC23 signature and Moffitt groups.
Single Cell Data for PDX-derived Cell Lines.
Seurat version 3.168 was used to analyze all single cell analysis. Each of the lines, PATC124, PATC53
and PATC69 were analyzed separately. PATC53 contained two replicates, which were merged for
analysis. For all PDX cell lines, single cells with a minimum of 350 expressed genes and less than 10%
mitochondrial reads were retained. Genes expressed in less than 3 cells and mitochondrial genes were
removed from further analysis. The data was log normalized, transformed using the “vst” function with top
2000 variant genes. The total RNA count, cell cycle score and mitochondrial reads were regressed out.
For PATC124, principal-component analysis and uniform manifold approximation and projection (UMAP)
with the first 15 dimensions was performed, followed by identifying clusters using a resolution of 0.15. For
PATC69, principal-component analysis and uniform manifold approximation and projection (UMAP) with
the first 15 dimensions was performed, followed by identifying clusters using a resolution of 0.10. PATC53
contained two replicates, which were integrated as one dataset following normalization and variant
stabilization. Total RNA count, cell cycle score and mitochondrial reads were regressed out of the
integrated dataset, and PCA and UMAP on first 15 dimensions was performed, further identifying clusters
using a resolution of 0.10.
Core Needle Biopsy Single Cell Analysis
Seven CNB samples were used – four primary tumors, one liver, lung and vaginal metastases sample
each36. All seven samples were filtered to have a minimum of 350 expressed genes and less than 25%
mitochondrial reads were retained. Similar to the PDX analysis, genes expressed in less than 3 cells and
mitochondrial genes were removed from further analysis. Each sample was log normalized, transformed
using the “vst” function with top 2000 variant genes. The seven samples were then integrated, following
which, the total RNA count, cell cycle score and mitochondrial reads were regressed out. Principal-
component analysis and UMAP with the first 20 dimensions was performed, followed by identifying
clusters using a resolution of 0.20. Cell types associated with clusters were identified using established
stromal markers10,69 and epithelial cell markers specific to liver and pancreas36,70,71.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Cluster Signatures for Single Cells
Network-based normalization of cluster signatures amongst single cells was achieved by first identifying a
subset of genes whose expression was strongly correlated with its own cluster assignment (r > 0.4). For
the PDX models, using the subset of highly correlated genes per cluster, if more than 30% of the cluster
was captured per single cell, we calculated a centroid score by taking the mean of normalized UMI count.
For patient CNB samples, centroid was calculated for all single cells if more than 20% of the cluster was
captured. The percentage of cluster genes expressed within each cell type was calculated by taking the
mean number of genes with UMI count above 0 per cluster across all the single cells within each cell
type. Single cells were grouped into classical, quasibasal and basal based on the C1vC23 differential
signature, classified as basal if the differential signature was < -0.1, quasibasal if they are -0.1 to 0.1 and
classical if they were > 0.1.
Biomarker Prioritization
Cluster 1 and 23 specific networks were generated using Cytoscape 3.7.2 55, where we calculated the
closeness centrality of each gene to its respective cluster. Gene to cluster correlation was calculated by
computing the Pearson correlation between each gene and its corresponding cluster score across our 48
PDX models. First degree nodes (FDN) were identified as directly connected nodes to CRISPR
associated vulnerabilities DNM2 and MAPK11.
The R packages ggplot272 and ComplexHeatmap52 were used to plot the comparison between the gene
to cluster correlation and closeness centrality and the gene expression heatmap of first-degree nodes,
respectively.
Multiplex IHC-IF staining and data analysis
Formalin fixed and paraffin embedded (FFPE) PATX samples were sectioned into 3 µm thick sections
and placed on positive charged slides. Sections were deparaffinised by baking at 60 ºC for 1 hour, then
rehydrated by serial passage thorough xylene and graded alcohol. All sections were subjected to an initial
heat-induced epitope retrieval (HIER) in 10 mM citrate buffer with 0.05% Tween20, pH 6.0, at 95 ºC for 15
minutes using a BioGenex EZ retrieval microwave. Subsequent HIER for Opal development was done
using fresh citrate buffer at 95 ºC for 10 minutes. All sections were initially blocked for endogenous
peroxidase using Bloxal (Vector Labs SP6000). After and before each primary incubation, sections were
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
blocked using 2.5% serum (Vector Labs S1012). Opal, indirect, and direct immunofluorescence methods
were used. First node protein targets were developed using Opal methods, VSIG1 (Thermofisher MAB
4818, at 1/2000), and Vimentin (Cell Signaling Technologies 5741, at 1/1600). Ki67 (Cell Signaling
Technologies 9129, at 1/400) was developed indirectly using anti-rabbit secondary, Alexa 680
(Thermofisher A32802, 1/500). Lastly, HLA conjugated to Alexa 647 (Abcam 199837, at 1/1000) was
used to aid in tissue segmentation. Sections were then counterstained with DAPI.
Slides were imaged using Vectra 3.0 Automated Quantitative Pathology Imaging System (Akoya
Biosciences). Image processing and analysis was performed using inForm Software v2.4 (Akoya
Biosciences). For a subset of images from each PATX, the following was performed: Images were
unmixed and autofluorescence was removed. Then, tissue was segmented as tumor, stroma or other
based on training regions and pattern recognition of DAPI and HLA stain. This was followed by cell
segmentation using DAPI and HLA to segment nuclei, cytosol, and membrane. Phenotyping was
performed for each marker individually by selecting representative positives for algorithm training and
allowing the software to select the rest. Batch analysis of all images was performed using the
segmentation and phenotyping algorithm described above. At this threshold of detection, VSIG+/VIM+
cells represented a negligible population.
Spatial and data analysis was performed using phenoptrReports (Akoya Biosciences), an R script
package. Briefly, all single cell phenotype data was merged, aggregated and consolidated for each
marker. Consolidated data was analyzed based on the phenotypes of interest. Using the XY coordinates
of each cell, spatial relationships between cell types was visualized using the phenotrReports GUI. All
data was graphed using GraphPad Prism version 8.0.0 for Windows, GraphPad Software, San Diego,
California USA, www.graphpad.com.
Feature Barcoding Vector
The feature barcode vector (LentiCRISPR-E-10xcs1) was built using the pLentiCRISPR-v2 (addgene:
52961) as the base vector. All of the molecular modifications were performed by Epoch Life Sciences
(Missouri City, Tx). The pLentiCRISPR-v2 was modified to an optimized sgRNA scaffold73 that included
the 3’ 10x capture sequence 1:
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
(cgtttCagagctaTCGTGgaaaCAGCAtagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggtgcGCT
TTAAGGCCGGTCCTAGCAAtttttt); a ccdb bacterial expression cassette between the BsmBI restriction
sites was introduced to reduce background during sgRNA cloning and library generation; and an N-
terminal Flag-sv40 NLS was added to Sp. Cas9-nucleoplasmin NLS-P2A-Puro.
Preparation of Feature Barcoded sgRNA Knockout Populations
Lentiviral transductions of four separate feature barcode sgRNA vectors (targeting ABCG8, ILK, SMAD4
and ZEB1), were conducted on separate cell populations for both the PATC69 and PATC53 PDX lines.
Lentivirus was concentrated through ultra-centrifugation, resuspended in 200 µL of PBS, and stored at -
80 °C until use. For each condition, 1x106 cells were transduced in 10 cm treated plates (Corning) using 8
µg/mL Polybrene (Sigma-Aldrich). Media was replaced after a 16-hour incubation, and each cell
population was washed with PBS and then placed under puromycin selection for 72 hours. Following
selection, all conditions were cultured for 22 days (or 10 days post “CRISPR Screen Injection point”) to
match the in vitro CRISPR screen control-separation profile at day 10 (Extended Data Figure 2A). Prior to
library preparation for scRNAseq, knockout populations were combined in equal proportion for each PDX
line. Library preparation was conducted on a total of 10,000 cells per PDX line, resulting in an
approximate coverage of 2500 cells per condition.
sgRNA Phenotype Confirmation and Confirmation of Site-Specific Cutting
Utilizing the same feature barcoded populations prepared for scRNAseq, 1500 cells/well were seeded in
triplicate in 12 well tissue culture plates (Corning) immediately following 72 hours of 2 µg/mL Puromycin
selection. Cells were then cultured for a minimum of 10 doublings. Individual plates were then stained
with 0.5% crystal violet (in 25% methanol) for 2 hours. Plates were washed in water, dried overnight and
then digitally scanned. After digitally scanning the plates, crystal violet was dissolved in equal volumes of
1% SDS, and 200 µL of each sample was moved into 96-well plates to measure absorbance at 570 nm.
Relative growth was quantified based on the internal sgABCG8 negative control. All data was graphed
using GraphPad Prism v 8.0.
Puromycin selected cells were collected for Sanger sequencing to confirm sgRNA induced indel formation
relative to non-infected populations. Cell pellets for each sgRNA across both PATC69 and PATC53, 1x106
cells each, were isolated at Day 12 and Day 40 for each PDX line (sgRPS27A indels representative at
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Day 12, all other sgRNAs at Day 40). Cell pellets were centrifuged, washed once with PBS, and frozen at
-80 °C. All Sanger sequenced regions were normalized against respective non-transduced PDX line
populations, 1x106 cells/pellet. Primers for each cut site were developed to allow for 400 - 800 bp
products, and primer sites were run on 2% agarose gels and extracted following amplification. Site-
specific sequencing primers were utilized for Sanger sequencing, and indels distributions were calculated
using the Synthego ICE Analysis Tool74.
Feature Barcoding Analysis
Feature barcoded scRNAseq data were analyzed using Seurat3.168. Each cell line was individually
evaluated. Cells expressing more than 350 genes and less than 25% mitochondrial reads were retained
and subsequently log normalized, variant stabilized, and total RNA count, mitochondrial reads and cell
cycle were regressed out. All guide-level samples for each PDX line were merged using the Seurat
Anchor Cell feature to provide a direct point of comparison for PATC69 and PATC53 perturbations. A
total of 10,113 PATC69 (3449 ABCG8, 1962 ILK, 2662 SMAD4, and 2040 ZEB1 knockout cells) and
11,439 PATC53 (3414 ABCG8, 2952 ILK, 2819 SMAD4, and 2254 ZEB1 knockout cells) cells were
retained and analyzed for processing. Principal-component analysis and UMAP with the first 20
dimensions was performed, with clustering performed at 0.15 resolution for. Cluster centroids were
calculated using the method described above for the patient CNB samples.
ACKNOWLEDGMENTS
We thank the Advanced Technology Genomics Core (ATGS), the UTMDACC Flow Cytometry and
Cellular Imaging Core Facility, the UTMDACC Department of Veterinary Medicine, all funded by a Cancer
Center Support Grant (P30 CA016672). We thank David Aten and Jordan Pietz at the UTMDACC
Medical Graphics and Photography Department funded by Cancer Center Support Grant (P30
CA016672). We thank the UTMDACC Science Park NGS Facility, funded by CPRIT Core Facility
Support Grants (RP120348 and RP170002). We thank Sisi Gao for her contributions editing the
manuscript. We thank our colleagues at the Institute of Applied Cancer Science (IACS), Justin Huang,
Stephanie Schmidt, and Vandhana Ramamoorthy, for technical assistance and suggestions. We thank
David Pollock at ATGS for technical assistance and advice. G.F.D. was supported by the Sewell Family
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Endowed Chairmanship in Genomic Medicine, NIH/NCI P01 CA117969 12, and the UT MD Anderson
Cancer Moon Shots Program. S.S was supported by the CPRIT Research Training Grant (RP170067).
J.R. and I.H. received support from the Paula-Altman Goldstein Discovery Fellowship. J.R. also received
support from P01 CA117969 12. W.Y. is supported by the PanCAN-AACR Pathway to Leadership Grant
(16-70-25-YAO) and the Pancreatic Cancer Action Network Translational Research Grant (19-65-YAO).
TH is a CPRIT Scholar in Cancer Research, and is supported by NIGMS grant R35GM130119 and MD
Anderson Cancer Center Support Grant P30 CA016672. TH is a consultant for Repare Therapeutics.
AUTHOR CONTRIBUTIONS
J.L.R., S. Srinivasan., A.C. and G.F.D. conceived and designed the study. J.L.R., W.Y., M.P., A.
Machado. performed the experiments. S. Srinivasan, S. Seth, E.K. and C.A.B analyzed the data. PDX
generation was guided by the lab of M.K. Single-cell RNAseq data of PDXs were provided by C.Y.L., I.H.
and A.V. Single-cell RNAseq data from patients were provided by J.J.L., P.A.G., and A. Maitra. Clinical
annotations for PDXs were provided by M.S. and E.J.K. Guidance on experimental and computational
aspects of the study was provided by W.Y., J.R.D., C.A.B., G.G., A.V., T.P.H., A. Maitra, and T.H. The
project was led and supervised by A.C. and G.F.D. The manuscript was written by J.L.R., S. Srinivasan,
A.C. and G.F.D. and edited by A.K.D., A. Maitra and T.H. All authors approved the final manuscript.
COMPETING INTERESTS
The authors declare no competing interests.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
REFERENCES
1. Siegel, R.L., Miller, K.D. & Jemal, A. Cancer statistics, 2019. CA Cancer J Clin 69, 7-34 (2019). 2. Society, A.C. Cancer Facts & Figures 2020. in Atlanta: American Cancer Society (2020). 3. Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global
genomic analyses. Science 321, 1801-6 (2008). 4. Cancer Genome Atlas Research Network. Integrated Genomic Characterization of Pancreatic
Ductal Adenocarcinoma. Cancer Cell 32, 185-203.e13 (2017). 5. Hilmi, M., Bartholin, L. & Neuzillet, C. Immune therapies in pancreatic ductal adenocarcinoma:
Where are we now? World J Gastroenterol 24, 2137-2151 (2018). 6. Collisson, E.A. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses
to therapy. Nat Med 17, 500-3 (2011). 7. Moffitt, R.A. et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of
pancreatic ductal adenocarcinoma. Nat Genet 47, 1168-78 (2015). 8. Bailey, P. et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 531,
47-52 (2016). 9. Puleo, F. et al. Stratification of Pancreatic Ductal Adenocarcinomas Based on Tumor and
Microenvironment Features. Gastroenterology 155, 1999-2013.e3 (2018). 10. Chan-Seng-Yue, M. et al. Transcription phenotypes of pancreatic cancer are driven by genomic
events during tumor evolution. Nat Genet 52, 231-240 (2020). 11. Simeonov, K.P. et al. Single-cell lineage and transcriptome reconstruction of metastatic cancer
reveals selection of aggressive hybrid EMT states. bioRxiv, 2020.08.11.245787 (2020). 12. Carugo, A. et al. In Vivo Functional Platform Targeting Patient-Derived Xenografts Identifies
WDR5-Myc Association as a Critical Determinant of Pancreatic Cancer. Cell Rep 16, 133-147 (2016).
13. Manguso, R.T. et al. In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target. Nature 547, 413-418 (2017).
14. Jung, J. et al. Generation and molecular characterization of pancreatic cancer patient-derived xenografts reveals their heterologous nature. Oncotarget 7, 62533-62546 (2016).
15. Yang, S. et al. COEXPEDIA: exploring biomedical hypotheses via co-expressions associated with medical subject headings (MeSH). Nucleic Acids Res 45, D389-d396 (2017).
16. Edler, D., Eriksson, A. & Rosvall, M. The MapEquation software package, available online at http://www.mapequation.org.
17. Sarto Basso, R., Hochbaum, D.S. & Vandin, F. Efficient algorithms to discover alterations with complementary functional association in cancer. PLoS Comput Biol 15, e1006802 (2019).
18. Franklin, R.B., Zou, J. & Costello, L.C. The cytotoxic role of RREB1, ZIP3 zinc transporter, and zinc in human pancreatic adenocarcinoma. Cancer Biol Ther 15, 1431-7 (2014).
19. Su, J. et al. TGF-β orchestrates fibrogenic and developmental EMTs via the RAS effector RREB1. Nature 577, 566-571 (2020).
20. Deacu, E. et al. Activin type II receptor restoration in ACVR2-deficient colon cancer cells induces transforming growth factor-beta response pathway genes. Cancer Res 64, 7690-6 (2004).
21. Biernat, J. et al. Protein kinase MARK/PAR-1 is required for neurite outgrowth and establishment of neuronal polarity. Mol Biol Cell 13, 4013-28 (2002).
22. Cohen, D., Rodriguez-Boulan, E. & Müsch, A. Par-1 promotes a hepatic mode of apical protein trafficking in MDCK cells. Proceedings of the National Academy of Sciences of the United States of America 101, 13792-13797 (2004).
23. Grützmann, R. et al. Gene expression profiling of microdissected pancreatic ductal carcinomas using high-density DNA microarrays. Neoplasia 6, 611-22 (2004).
24. Zhang, G. et al. DPEP1 inhibits tumor cell invasiveness, enhances chemosensitivity and predicts clinical outcome in pancreatic ductal adenocarcinoma. PLoS One 7, e31507 (2012).
25. Badea, L., Herlea, V., Dima, S.O., Dumitrascu, T. & Popescu, I. Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. Hepato-gastroenterology 55, 2016-2027 (2008).
26. Yao, W. et al. Syndecan 1 is a critical mediator of macropinocytosis in pancreatic cancer. Nature 568, 410-414 (2019).
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
27. Commisso, C. et al. Macropinocytosis of protein is an amino acid supply route in Ras-transformed cells. Nature 497, 633-637 (2013).
28. Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 45, D362-d368 (2017).
29. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15, 554 (2014).
30. Allen, F. et al. JACKS: joint analysis of CRISPR/Cas9 knockout screens. Genome Res 29, 464-471 (2019).
31. Meyers, R.M. et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet 49, 1779-1784 (2017).
32. Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016).
33. Kim, E. & Hart, T. Improved analysis of CRISPR fitness screens and reduced off-target effects with the BAGEL2 gene essentiality classifier. bioRxiv, 2020.05.30.125526 (2020).
34. Aiello, N.M. et al. EMT Subtype Influences Epithelial Plasticity and Mode of Cell Migration. Dev Cell 45, 681-695.e4 (2018).
35. Bardeesy, N. et al. Smad4 is dispensable for normal pancreas development yet critical in progression and tumor biology of pancreas cancer. Genes Dev 20, 3130-46 (2006).
36. Lee, J.J. et al. Elucidation of tumor-stromal heterogeneity and the ligand-receptor interactome by single cell transcriptomics in real-world pancreatic cancer biopsies. bioRxiv, 2020.07.28.225813 (2020).
37. Li, Y. et al. Inhibition of integrin-linked kinase attenuates renal interstitial fibrosis. J Am Soc Nephrol 20, 1907-18 (2009).
38. Sawai, H. et al. Integrin-linked kinase activity is associated with interleukin-1 alpha-induced progressive behavior of pancreatic cancer and poor patient survival. Oncogene 25, 3237-46 (2006).
39. Serrano, I., McDonald, P.C., Lock, F.E. & Dedhar, S. Role of the integrin-linked kinase (ILK)/Rictor complex in TGFβ-1-induced epithelial-mesenchymal transition (EMT). Oncogene 32, 50-60 (2013).
40. Sunami, Y., Rebelo, A. & Kleeff, J. Lipid Metabolism and Lipid Droplets in Pancreatic Cancer and Stellate Cells. Cancers (Basel) 10(2017).
41. Hayashi, A. et al. A unifying paradigm for transcriptional heterogeneity and squamous features in pancreatic ductal adenocarcinoma. Nature Cancer 1, 59-74 (2020).
42. Malkoski, S.P. & Wang, X.J. Two sides of the story? Smad4 loss in pancreatic cancer versus head-and-neck cancer. FEBS Lett 586, 1984-92 (2012).
43. Kim, M.P. et al. Generation of orthotopic and heterotopic human pancreatic cancer xenografts in immunodeficient mice. Nat Protoc 4, 1670-80 (2009).
44. Ivanics, T. et al. Patient-derived xenograft cryopreservation and reanimation outcomes are dependent on cryoprotectant type. Lab Invest 98, 947-956 (2018).
45. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60 (2009).
46. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491-8 (2011).
47. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013). 48. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol
11, R106 (2010). 49. Wilkerson, M.D. & Hayes, D.N. ConsensusClusterPlus: a class discovery tool with confidence
assessments and item tracking. Bioinformatics 26, 1572-3 (2010). 50. Young, M.D., Wakefield, M.J., Smyth, G.K. & Oshlack, A. Gene ontology analysis for RNA-seq:
accounting for selection bias. Genome Biol 11, R14 (2010). 51. Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of
gene ontology terms. PLoS One 6, e21800 (2011). 52. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in
multidimensional genomic data. Bioinformatics 32, 2847-9 (2016). 53. Karczewski, K.J. et al. The mutational constraint spectrum quantified from variation in 141,456
humans. Nature 581, 434-443 (2020).
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
54. Szklarczyk, D. et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43, D447-52 (2015).
55. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-504 (2003).
56. Rückert, F. et al. Co-expression of KLK6 and KLK10 as prognostic factors for survival in pancreatic ductal adenocarcinoma. British journal of cancer 99, 1484-1492 (2008).
57. Pilarsky, C. et al. Activation of Wnt signalling in stroma from pancreatic cancer identified by gene expression profiling. Journal of cellular and molecular medicine 12, 2823-2835 (2008).
58. HARVARD, B.I.o.M.a. Broad Institute TCGA Genome Data Analysis Center (2014): Firehose VERSION run. .
59. Storey, J.D., Bass, A.J., Dabney, A. & Robinson, D. qvalue: Q-value estimation for false discovery rate control. R package version 2.20.0, http://github.com/jdstorey/qvalue. (2020).
60. Ying, H. et al. Oncogenic Kras maintains pancreatic tumors through regulation of anabolic glucose metabolism. Cell 149, 656-70 (2012).
61. König, R. et al. A probability-based approach for the analysis of large-scale RNAi screens. Nat Methods 4, 847-9 (2007).
62. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940-3941 (2005).
63. L., S., M., F., T.B., M. & A.E., R. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal 8, 289–317 (2016).
64. Charrad, M., Ghazzali, N., Boiteau, V. & Niknafs, A. NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. 2014 61, 36 (2014).
65. Kassambara, A. & Mundt, F. Package “factoextra.” R Top. Doc. (2017). 66. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for
interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-50 (2005). 67. Koay, E.J. et al. A Visually Apparent and Quantifiable CT Imaging Feature Identifies Biophysical
Subtypes of Pancreatic Ductal Adenocarcinoma. Clinical Cancer Research 24, 5883-5894 (2018). 68. Satija, R., Farrell, J.A., Gennert, D., Schier, A.F. & Regev, A. Spatial reconstruction of single-cell
gene expression data. Nat Biotechnol 33, 495-502 (2015). 69. Muraro, Mauro J. et al. A Single-Cell Transcriptome Atlas of the Human Pancreas. Cell Systems
3, 385-394.e3 (2016). 70. Peng, J. et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant
progression in pancreatic ductal adenocarcinoma. Cell Res 29, 725-738 (2019). 71. MacParland, S.A. et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic
macrophage populations. Nature Communications 9, 4383 (2018). 72. Wickham, H. ggplot2: Elegant Graphics for Data Analysis, (Springer-Verlag, New York, 2016). 73. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency.
Genome Biology 16, 280 (2015). 74. Hsiau, T. et al. Inference of CRISPR Edits from Sanger Trace Data. bioRxiv, 251082 (2018).
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
FIGURE LEGENDS
Figure 1: Defining signaling diversity in PDAC through the construction and annotation of a PDX-
based co-expression network. (a) Overview outlining the development of the PDAC co-expression
network from a diverse and representative PDAC patient-derived xenograft (PDX) cohort of 48 models,
and subsequent integration of mutation data for cluster enrichment analysis. (b) Oncoplot of common
PDAC-associated mutations in the 48-model PDX cohort displays mutation frequencies similar to
previous publications. (c) Force-directed layout of the PCEN. Gene clusters, defined by Gene Ontology
(GO), visually highlight nodes with a minimum of 12 edges. (d) Heat map depicting Pearson correlations
between cluster centroid scores across 48 PDX models. Starred correlations represent an adjusted P
value < 0.05. (e) Force-directed layout of the PCEN (top) and matching GO hierarchical treemaps
(bottom) of prominent anti-correlative cluster centroid trends across the PDAC co-expression network
highlighting Cluster 1 (Lipid metabolism) vs. Cluster 23 (Cell development). (f) Force-directed layout of
the PCEN (top) and matching GO hierarchical treemaps (bottom) of prominent anti-correlative cluster
centroid trends across PDAC co-expression network highlighting Cluster 2 (Golgi vesicle transport) vs.
Cluster 13 (mRNA catabolism).
Figure 2. Informed CRISPR characterization of PDX models identifies co-expression network
cluster-associated functional diversity within the PDAC cohort. (a) Overview of our prioritized in vivo
functional genomics platform, whereby a custom PDAC-prioritized RNAi library was used to screen a
functional set of proteins largely localized at the cell surface. Functionalized surface proteins were then
expanded upon using a protein-protein interaction (PPI) network, which was further characterized using a
custom CRISPR library and integrated into the PCEN. (b) Venn diagram of functional targets derived from
in vivo orthotopic CRISPR screening of PDX lines using a quantile-normalized Bayes factor (BF) > 1 (c -
e) CRISPR screening results from PATC69 (c), PATC124 (d), and PATC53 (e) PDX lines overlaid onto a
merged PPI force-directed diagram. The BF of each gene indicates the degree of vulnerability of the
gene, with BF > 1 indicating an essential gene. (f - h) PCEN anchoring of in vivo functional
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
dependencies, represented as quantile-normalized Bayes Factors, to inform on cluster-associated
vulnerability context in PATC69 (f) PATC124 (g), and PATC53 (h) PDX lines.
Figure 3: Prominent anti-correlated CoDEX clusters recapitulate a granular spectrum of
“classical”, “quasi-basal”, and “basal-like” cells inter-tumorally. (a) [left panel] Overlay of the
normalized expression score of each gene over the PDAC co-expression network of CoDEX defined
classical, quasi-basal and basal models [middle panel]. Pie chart comparison of clinical histology and
recurrence of patients associated with PDX models across subtypes. P values are derived from Chi
Square test. [right panel] K means clustering reveals optimal k=3 using the Cluster 1 and Cluster 23
centroid scores across the PDX models paired with Consensus Clustering (k=2) into Moffitt’s classical
and basal subtypes (b) Box plot of average Cluster 1 centroid score across Moffitt-defined classical and
basal-like models. Box-whisker plots show median�±�first and third quartiles. P�values are derived from
t–test (n�=�48 PDX tumors). (c) PCA demonstrating the subtypes derived from the PDAC co-
expression network with the quasi-basal group driving the PCs compared to the binary Moffitt
Classification. (d) GSEA conducted on PDAC co-expression network-defined classical and basal-like
models identifies EMT as a top pathway enriched in basal-like models FDR = 0.000. (e) [left panel] Pie
chart comparison of tumor differentiation status in high-epithelial-content TCGA models comparing PDAC
co-expression network-derived classifications and Moffitt classifications (n=76 tumors). [right panel] Heat
map of C1vC23 anti-correlated cluster signature differential, matching Moffitt classification and tumor
grade on TCGA tumor samples (samples with ≥ 30% epithelial content).
Figure 4: Intra-tumoral characterization of anti-correlated CoDEX clusters recapitulate a granular
and tumor-localized “classical”, “quasi-basal”, and “basal-like” spectrum. (a - c) UMAP of single-
cell sequenced low-passage (a) PATC69 (7,857 cells), (b) PATC124 (9,482 cells) and (c) PATC53
(14,791 cells) cell lines with an overlay of a PDAC co-expression network-normalized C1v23 signature
differential with respective percentages of cells corresponding to each subtype [right panels]. Pie chart
with distribution of three classifications (percentage) (d) Density histogram of the C1v23 signature
differential distributions of PATC124, PATC53, and PATC69 PDX lines, with more positive cluster
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
differential indicating enrichment in Cluster 1 and more negative cluster differential indicating Cluster 23
enrichment. (e) Diagram of the locations and numbers of isolated patient core needle biopsy (CNB)
samples used for single-cell RNA sequencing analysis. (g) Combined UMAP outlining the tumor
microenvironment components of multiple single-cell sequenced primary (n = 4) and metastatic (n = 3)
CNB samples from PDAC patients. (g) The mean percentage of cluster representation for each cluster in
the PDAC co-expression network from the cellular constituents of the tumor microenvironment. Each dot
per cell type represents individual clusters. Box plot center: mean; box: quartiles 1–3; whiskers: quartiles
1–3 ± 1.5 × IQR. (h) Combined CNB UMAP with an overlay of a PDAC co-expression network-normalized
C1vC23 signature represented as the difference between Cluster 1 and Cluster 23 expressions.
Figure 5: CoDEX platform prioritizes potential biomarkers by integrating first-degree nodes of
anchored dependencies and cluster centrality. (a) Force-directed layout of prominent anti-correlative
cluster centroid trends across the PDAC co-expression network highlighting Cluster 1 (Lipid metabolism)
vs. Cluster 23 (Cell development) along with PATC69 in vivo specific vulnerability, DNM2, and PATC53
specific in vivo vulnerability, MAPK11, from our CRISPR screen analysis. Force-directed network outlining
first-degree nodes of DNM2. (b) Dot plot of closeness centrality of each gene within Cluster 1 and Cluster
23 calculated using Cytoscape compared with gene-to-cluster correlation calculated as the Pearson
correlation between single-gene expression and Cluster 1 and 23 centroids using the PDX cohort (n=48)
(c) Force-directed layout of first-degree nodes of Cluster 23 localized MAPK11 generated from
Cytoscape, or direct gene-wise correlations within the co-expression network with color of each gene
representing its respective cluster assignment (d) Force-directed layout of first-degree nodes of Cluster 1
localized DNM2 generated from Cytoscape (e) Heat map of the normalized expression the first-degree
node of DNM2 and MAPK11 in the PDX models (n=48), displayed along the C1vC23 signature-
determined classification on the left and gene-wise closeness centrality on the top annotation (f) IHC-IF
for VSIG1 vs. VIM, in the context of HLA, DAPI and Ki67, in representative slices of classical (PATX90),
quasi-basal (PATX124) and basal-like (PATX53) PDXs (g) Quantification of VIM-positive, VSIG1-positive,
and double-negative signatures in HLA-positive tumor cells.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Figure 6. Functional validation of CoDEX-informed intra-tumoral context dependency for C1vC23-
associated genetic targets. (a) Force-directed layout of prominent anti-correlative cluster centroid trends
across the PDAC co-expression network highlighting Cluster 1 (Lipid metabolism) vs. Cluster 23 (Cell
development) along with PATC53 in vivo specific vulnerabilities from our CRISPR screen analysis with
size of the gene representing the quantile-normalized BF (b) Overview of the feature barcoding strategy
for intratumoral tracking of PDAC co-expression network signatures following sgRNA knockouts.
Targeting sgRNAs for ABCG8 (as a negative control), ILK, SMAD4, and ZEB1 were selected and based
on CRIPSR screening results. PATC69 and PATC53 cell populations were individually transduced in
vitro, selected with puromycin, and parallel assays were also prepared to confirm knockout phenotypes
through colony growth, and sgRNA cutting through Sanger sequencing. Each knockout cell population
was cultured in vitro for 22 days (CRISPR screening time-point day 10), and then combined scRNAseq
library preparation. A total of 10,000 cells per PDX line were sequenced, 2,500 per condition. (c) UMAP
of the PATC69 PDX line with defined ABCG8, ILK, SMAD4, and ZEB1 knockout populations (10113 total
cells). (d) Normalized viability of PATC69 cells following knockout of ABCG8, SMAD4, ZEB1, ILK, or
RPS27A with sgRNA. **p < 0.05. (e) UMAP of the PATC53 PDX line with defined ABCG8, ILK, SMAD4,
and ZEB1 knockout populations (11,439 total cells). (f) Normalized viability of PATC53 cells following
knockout of ABCG8, SMAD4, ZEB1, ILK, or RPS27A with sgRNA. *p < 0.05; **p < 0.01. (g) Comparative
PATC69 density plots of C1vC23 signature following knockout with sgILK, sgSMAD4, sgZEB1 and
sgABCG8 negative control calculated as the differential between the Cluster 1 and 23 centroid score, with
more positive differential indicating cluster 1 enrichment and vice versa. P values and D statistic derived
from Kolmogorov Smirnov test (h) Comparative PATC53 density plots of C1vC23 signature in PATC53
cells following knockout of ILK, SMAD4, ZEB1, or ABCG8 (negative control).
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Figure 1: Defining Signaling Diversity in PDAC through the Construction and Annotationof a PDX-based Co-Expression Network
FE
DC
BA
REVIGO Gene Ontology treemap
cellular modifiedamino acid
metabolic process
bloodcirculation
digestion
digestivesystemprocesslymphocyte
differentiation
drugmetabolicprocess
acidsecretion
alcoholmetabolicprocess
aniontransport
cellularlipid
catabolicprocess
glycerolipidbiosynthetic
process
glycosylation
icosanoidmetabolicprocess
iontransport
lipidmetabolicprocess
long−chainfatty acidmetabolicprocess
neutrallipid
metabolicprocess
organicacid
metabolicprocess
organic acidtransport
proteinglycosylation
smallmolecule
biosyntheticprocess
steroidmetabolicprocess
terpenoidmetabolicprocess
transmembranetransport
responseto
carbohydrateresponse to
xenobiotic stimulus
ammoniumion
metabolism
carbohydratemetabolism
cellactivation
digestion
drugcatabolism
drugmetabolism
lipidmetabolism organic
hydroxycompoundmetabolism
regulationof
hormonelevels
xenobioticmetabolism
REVIGO Gene Ontology treemap
actinfilament−based
process
celldevelopment
cellmotility
cellrecognition
cell−cellsignaling
extracellularmatrix
organization
regulationof actin
filament−basedprocess
regulationof
anatomicalstructure
size
regulationof cell
projectionorganization
regulationof
cellularcomponent
size
reproductivestructure
development
reproductivesystem
developmenttissue
development
behavior
biologicaladhesion
celladhesion
celldevelopment
collagenmetabolism
growth
locomotion
locomotorybehavior
neuralprecursor
cellproliferation
REVIGO Gene Ontology treemap
Golgi vesicletransport
REVIGO Gene Ontology treemap
posttranscriptionalregulationof gene
expression
regulationof cellular
amidemetabolicprocess
amidetransport
peptidetransport
macromolecularcomplexsubunit
organization
organelleassembly
nuclear−transcribedmRNA
catabolism,nonsense−mediated
decay
proteinlocalization
toendoplasmic
reticulum
ribosomebiogenesis
Pearson
1-1
Pearson
1-1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
In vivo orthotopic CRISPR screens in cluster-informed PDXs
PPI-based functional PDAC CRISPR Library
-
Bb C
A
Cluster-defined PDAC subtype associated dependencies
Cluster #2
Cluster #1
Cluster #23
Cluster #13
D
C
B
A
Cluster #1
Cluster #23
A B
C D E
in vivo
RACK1RPS27AMYCRPL30GRB2MTORYKT6CDC27
RACK1RPS27AMYCRPL30GRB2MTORYKT6CDC27
PATC53 PATC124
PATC69
CRISPR Screens
KRAS
RPS27AMAD2L2
CCND1
DNM2
SKP2
ATP1A1
PRKDC
SNAP23
CDC42
CTNNB1
RPS27A
MAD2L2
SKP2SNAP23
EZRDOK1
MAPK11PRKDC
DVL1PRKCZ
ATP1A3
TLE2
PXN
TLN1
CRKL
ZEB1KRAS
RPS27A
ITGB1
TLN1
CCND1
ATP1A1
NKAIN4
NPNT
CCL5FAM171A2
SHH
LRP1CYBA
SDC3
MAPK11
SMAD4
SNAI1
CDC42
NCK1
SKP2
CA2
ITGB2
MFNG
Gene: 1 - 4.9Gene: 5 - 9.9
Gene: 10 - 14.9
Gene: 15 - 25
Quantile-Normalized Bayes Factors
F
Figure 2: Informed CRISPR chatacterization of PDX models identifies co-expressionnetwork cluster-associated functional diversity within the PDAC cohort
PATC53PATC69 PATC124
PATC69 PATC124 PATC53G H
32
Quantile-Normalized Bayes Factors
-24
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
B
Figure 3: Prominent anti-correlating clusters recapitulate a granular spectrum of “classical”, “quasi-basal”, and “basal-like” cells intertumorally
ClassicalQuasi-basalBasal-like
CEN
_Sub
type
Mof
fitt
TCGA.FB.AAPU.01TCGA.HZ.A8P1.01TCGA.US.A776.01TCGA.S4.A8RM.01TCGA.US.A77G.01TCGA.IB.A5SP.01TCGA.YY.A8LH.01TCGA.US.A779.01TCGA.FB.AAQ0.01TCGA.IB.7644.01TCGA.2J.AAB1.01TCGA.FB.A78T.01TCGA.FB.AAPQ.01TCGA.2L.AAQA.01TCGA.IB.AAUU.01TCGA.HV.AA8X.01TCGA.HV.A7OL.01TCGA.FB.AAQ6.01TCGA.FB.AAQ3.01TCGA.Q3.AA2A.01TCGA.2L.AAQJ.01TCGA.OE.A75W.01TCGA.F2.6879.01TCGA.2L.AAQL.01TCGA.PZ.A5RE.01TCGA.US.A77E.01TCGA.HV.A5A4.01TCGA.HZ.7919.01TCGA.2J.AABH.01TCGA.S4.A8RP.01TCGA.HZ.A9TJ.01TCGA.IB.7886.01TCGA.F2.A8YN.01TCGA.LB.A7SX.01TCGA.HZ.A49I.01TCGA.HZ.8317.01TCGA.3A.A9J0.01TCGA.S4.A8RO.01TCGA.2J.AABE.01TCGA.HZ.A8P0.01TCGA.3E.AAAZ.01TCGA.3A.A9IZ.01TCGA.2J.AABA.01TCGA.IB.7652.01TCGA.XD.AAUL.01TCGA.IB.8127.01TCGA.LB.A8F3.01TCGA.3A.A9IH.01TCGA.M8.A5N4.01TCGA.HV.A5A3.01TCGA.FB.AAPZ.01TCGA.IB.A7M4.01TCGA.IB.AAUN.01TCGA.HZ.7922.01TCGA.HZ.A77O.01TCGA.HV.A5A6.01TCGA.HZ.8636.01TCGA.IB.AAUO.01TCGA.3A.A9I5.01TCGA.3A.A9IC.01TCGA.2L.AAQE.01TCGA.FB.AAQ1.01TCGA.3A.A9IU.01TCGA.IB.A6UF.01TCGA.FB.A545.01TCGA.2J.AABU.01TCGA.3A.A9IB.01TCGA.IB.A7LX.01TCGA.FB.AAQ2.01TCGA.IB.7890.01TCGA.FB.AAPS.01TCGA.H6.8124.01TCGA.2J.AAB6.01TCGA.IB.A5SS.01TCGA.HZ.8005.01TCGA.2J.AABI.01
Clu
ster
1
Clu
ster
23 0
0.2
0.4
0.6
0.8
Purity
Cluster Enrichment
−0.200.20.40.6
PoorModerateWell
Total=45
Total=31
Total=25
Total=36
Total=15
Moffitt Classification Anticorrelating Cluster Signature
Tumor Differentiation Status
Total=17
Total=8
ModeratePoor
Total=20
Clinical Histology
Regional & LocoregionalDistant
Recurrence Type
Total=7
Total=12
Total=16
p value = 0.00217 p value = 0.05896
PDX Expression Profile
PATX090
PATX124
PATX148
CEN
_Sub
type
Mof
fitt
PATX090PATX122PATX097PATX100PATX079PATX032PATX112PATX172PATX039PATX144PATX141PATX034PATX147PATX110PATX140PATX045PATX126PATX077PATX066PATX081PATX108PATX046PATX118PATX102PATX076PATX137PATX136PATX069PATX104PATX124PATX050PATX059PATX142PATX060PATX056PATX133PATX106PATX107PATX179PATX043PATX092PATX055PATX064PATX053PATX155PATX070PATX153PATX178PATX148
Clu
ster
1
Clu
ster
23
Cluster Enrichment
−2−1012
Quasi-basalClassical
Basal-like
Normalized Gene Expression
-6.34 6.34
Classic
al
Basal-
like
-2
-1
0
1
Moffitt Subtypes (PDX Cohort)
Mod
ule
Cen
troid
Cluster 1 - Module Centroid ComparisonClassical vs. Basal-like PDXs
p = 3.67e-06
C
D
E
A
FBasal-likeClassical
Basal-like
ClassicalQuasi-basal
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
PATC6942.82 %
49.39 %
7.79 %
9.45 %
54.61 %
35.95 %
PATC124
10.46 %
64.89 %
24.65 %
A
Figure 4: Intratumoral characterization of anti-correlating clusters recapitulate a granular and tumor-localized “classical”, “quasi-basal”, and “basal-like” spectrum
ClassicalQuasi-basalBasal-like
CEN
Clus
ter %
Exp
ress
ed
ClassicalQuasi-basalBasal-like
ClassicalQuasi-basalBasal-like
Male Female
PATC53
B
C
D
E
F
G
H
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Figure 5: CoDEX methodology prioritizes potential biomarkers by integrating first-degree nodes of anchored dependencies and cluster centrality
Whole Slide
20X
E
DNM2
*
*
**
**
MAPK11
*
*
PATX90 PATX124 PATX148
C F
D
G
B
Gene: 1 - 4.9Gene: 5 - 9.9
Gene: 10 - 14.9
Gene: 15 - 25
Quantile-Normalized Bayes FactorsPATC69
PATC53
DNM2
MAPK11
Pearson-1 1
A
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint
Figure 6: Functional validation of CoDEX-informed intratumoral context dependency for C1vC23 associated genetic targets
PATC69 PATC53
PATC69 PATC53
C E
G H
ABCG8
SMAD4ZEB1
ILK
RPS27A
0.00.20.40.60.81.01.21.41.6
PATC69
sgRNA
Viab
ility
(Nor
mal
ized
)
ns
ns
ns
**
ABCG8
SMAD4ZEB1
ILK
RPS27A
0.00.20.40.60.81.01.21.41.6
PATC53
sgRNA
Viab
ility
(Nor
mal
ized
)
**
**
***
FD
A B
Gene: 1 - 4.9Gene: 5 - 9.9
Gene: 10 - 14.9
Gene: 15 - 25
Quantile-Normalized Bayes Factors
KS D Value: 0.328 p value: 2.2x10-16
KS D Value: 0.101 p value: 6.0x10-4
KS D Value: 0.252 p value: 2.2x10-16
KS D Value: 0.281 p value: 2.2x10-16
KS D Value: 0.265 p value: 2.2x10-16
KS D Value: 0.397 p value: 2.2x10-16
Pearson-1 1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted September 19, 2020. ; https://doi.org/10.1101/2020.09.17.302034doi: bioRxiv preprint