mirna and mrna cancer signatures determined by analysis of ... · toward identifying a...

6
miRNA and mRNA cancer signatures determined by analysis of expression levels in large cohorts of patients Sohila Zadran a , F. Remacle b,c , and R. D. Levine c,d,1 a Institute for Molecular Medicine and d Crump Institute for Molecular Imaging and Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095; b Department of Chemistry, University of Liege, B-4000 Liege, Belgium; and c The Fritz Haber Research Center for Molecular Dynamics, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 91904, Israel Contributed by R. D. Levine, September 12, 2013 (sent for review June 20, 2013) Toward identifying a cancer-specic gene signature we applied surprisal analysis to the RNAs expression behavior for a large cohort of breast, lung, ovarian, and prostate carcinoma patients. We characterize the cancer phenotypic state as a shared response of a set of mRNA or microRNAs (miRNAs) in cancer patients versus noncancer controls. The resulting signature is robust with respect to individual patient variability and distinguishes with high delity between cancer and noncancer patients. The mRNAs and miRNAs that are implicated in the signature are correlated and are known to contribute to the regulation of cancer-signaling pathways. The miRNA and mRNA networks are common to the noncancer and cancer patients, but the disease modulates the strength of the connectivities. Furthermore, we experimentally assessed the can- cer-specic signatures as possible therapeutic targets. Specically we restructured a single dominant connectivity in the cancer- specic gene network in vitro. We nd a deection from the cancer phenotype, signicantly reducing cancer cell proliferation and altering cancer cellular physiology. Our approach is grounded in thermodynamics augmented by information theory. The ther- modynamic reasoning is demonstrated to ensure that the derived signature is bias-free and shows that the most signicant re- distribution of free energy occurs in programming a system be- tween the noncancer and cancer states. This paper introduces a platform that can elucidate miRNA and mRNA behavior on a systems level and provides a comprehensive systematic view of both the energetics of the expression levels of RNAs and of their changes during tumorigenicity. microarray | deep sequencing | network connectivity | biomarker | maximal entropy T ranscriptomic technologies enable effective molecular phe- notyping of cancers, providing novel insights into the disease that are both experimentally and clinically relevant. One key outcome of such studies is biomarker discovery (14). An up to date discussion of the different approaches can be found in ref. 5. Despite the progress in analysis it is still the case that aggressive tumors do not respond well to currently available therapeutics (6), and the need to identify potential cancer targets remains (7). Analytical methods for processing transcriptomic datasets that illuminate novel biology are therefore critically important to the eld and may help develop better therapeutics and disease treatment options. Consequently, numerous tools exist and ad- ditional efforts are underway to develop increasingly informative methods. (Additional discussion of the different approaches can be found in SI Appendix, section I). Analytical methods often seek to learn new biology from transcriptomic data by focusing on differential transcript abun- dance (typically reported as fold change) between experimental (i.e., diseased) and reference (i.e., nondiseased) samples. An emphasis on fold change (usually above a minimal threshold) can identify cellular functions associated with a disease process. However, transcripts showing large fold change carry greater inuence in the analysis at the expense of those abundant species that may have shown signicant change in abundance but not a high degree of fold change. The issue is that a low-abundance transcript showing a large fold change in abundance between experimental and control samples is identied as more impor- tant for discriminating between cellular states than an equal or greater change in abundance of a strongly expressed transcript showing a far smaller fold change. However, absolute expression level and fold change may both contain information about the physiological phenomena, and, it is impossible to argue (without a complete understanding of cellular physiology) which of the two observations is more relevant. Unbiased, principled approaches are therefore desirable. Surprisal analysis is one such unbiased approach. Although the application of surprisal analysis to in vitro transcriptomic data has been previously described in detail (8), for the purpose of this manuscript we highlight the more relevant key elements of this approach as it relates to transcript expression proles of patients. First, it is worth noting that this method has roots in both ther- modynamics and information theory and has been used exten- sively starting with the analysis of specicity and selectivity in elementary chemical reactions (9). Second, as in its application to physical and chemical systems, the application of surprisal analysis to biological systems is built on the premise that these systems will tend toward a state of maximal entropy within the bounds imposed by one or more constraints. In biological systems, the constraints are not currently known but can be thought of as cellular design features, ultimately encoded within the genome that provide bounds on molecular phenotypes. The system is thus in a state of Signicance The study of mRNA and microRNA (miRNA) expression proles of cells and tissue has become a major tool for therapeutic development. The results of such experiments are expected to change the methods used in the diagnosis and prognosis of disease. We introduce surprisal analysis, an information-theo- retic approach grounded in thermodynamics, to compactly transform the information acquired from microarray studies into applicable knowledge about the cancer phenotypic state. The analysis of mRNA and miRNA expression data from ovarian serous carcinoma, prostate adenocarcinoma, breast invasive carcinoma, and lung adenocarcinoma cancer patients and organ- specic control patients identies cancer-specic signatures. We experimentally examine these signatures and their respective networks as possible therapeutic targets for cancer in single- cell experiments. Author contributions: F.R. and R.D.L. designed research; S.Z. designed and performed the experiments; S.Z., F.R., and R.D.L. performed research; S.Z., F.R., and R.D.L. analyzed data; and S.Z., F.R., and R.D.L. wrote the paper. The authors declare no conict of interest. 1 To whom correspondence should be addressed. E-mail: ra@fh.huji.ac.il. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1316991110/-/DCSupplemental. 1916019165 | PNAS | November 19, 2013 | vol. 110 | no. 47 www.pnas.org/cgi/doi/10.1073/pnas.1316991110 Downloaded by guest on December 11, 2020

Upload: others

Post on 24-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: miRNA and mRNA cancer signatures determined by analysis of ... · Toward identifying a cancer-specific gene signature we applied surprisal analysis to the RNAs expression behavior

miRNA and mRNA cancer signatures determined byanalysis of expression levels in large cohortsof patientsSohila Zadrana, F. Remacleb,c, and R. D. Levinec,d,1

aInstitute for Molecular Medicine and dCrump Institute for Molecular Imaging and Department of Molecular and Medical Pharmacology, David Geffen Schoolof Medicine, University of California, Los Angeles, CA 90095; bDepartment of Chemistry, University of Liege, B-4000 Liege, Belgium; and cThe Fritz HaberResearch Center for Molecular Dynamics, Institute of Chemistry, The Hebrew University of Jerusalem, Jerusalem 91904, Israel

Contributed by R. D. Levine, September 12, 2013 (sent for review June 20, 2013)

Toward identifying a cancer-specific gene signature we appliedsurprisal analysis to the RNAs expression behavior for a largecohort of breast, lung, ovarian, and prostate carcinoma patients.We characterize the cancer phenotypic state as a shared responseof a set of mRNA or microRNAs (miRNAs) in cancer patients versusnoncancer controls. The resulting signature is robust with respectto individual patient variability and distinguishes with high fidelitybetween cancer and noncancer patients. The mRNAs and miRNAsthat are implicated in the signature are correlated and are knownto contribute to the regulation of cancer-signaling pathways. ThemiRNA and mRNA networks are common to the noncancer andcancer patients, but the disease modulates the strength of theconnectivities. Furthermore, we experimentally assessed the can-cer-specific signatures as possible therapeutic targets. Specificallywe restructured a single dominant connectivity in the cancer-specific gene network in vitro. We find a deflection from thecancer phenotype, significantly reducing cancer cell proliferationand altering cancer cellular physiology. Our approach is groundedin thermodynamics augmented by information theory. The ther-modynamic reasoning is demonstrated to ensure that the derivedsignature is bias-free and shows that the most significant re-distribution of free energy occurs in programming a system be-tween the noncancer and cancer states. This paper introduces aplatform that can elucidatemiRNA andmRNA behavior on a systemslevel and provides a comprehensive systematic view of both theenergetics of the expression levels of RNAs and of their changesduring tumorigenicity.

microarray | deep sequencing | network connectivity | biomarker |maximal entropy

Transcriptomic technologies enable effective molecular phe-notyping of cancers, providing novel insights into the disease

that are both experimentally and clinically relevant. One keyoutcome of such studies is biomarker discovery (1–4). An up todate discussion of the different approaches can be found in ref. 5.Despite the progress in analysis it is still the case that aggressivetumors do not respond well to currently available therapeutics(6), and the need to identify potential cancer targets remains (7).Analytical methods for processing transcriptomic datasets thatilluminate novel biology are therefore critically important to thefield and may help develop better therapeutics and diseasetreatment options. Consequently, numerous tools exist and ad-ditional efforts are underway to develop increasingly informativemethods. (Additional discussion of the different approaches canbe found in SI Appendix, section I).Analytical methods often seek to learn new biology from

transcriptomic data by focusing on differential transcript abun-dance (typically reported as fold change) between experimental(i.e., diseased) and reference (i.e., nondiseased) samples. Anemphasis on fold change (usually above a minimal threshold) canidentify cellular functions associated with a disease process.However, transcripts showing large fold change carry greater

influence in the analysis at the expense of those abundant speciesthat may have shown significant change in abundance but nota high degree of fold change. The issue is that a low-abundancetranscript showing a large fold change in abundance betweenexperimental and control samples is identified as more impor-tant for discriminating between cellular states than an equal orgreater change in abundance of a strongly expressed transcriptshowing a far smaller fold change. However, absolute expressionlevel and fold change may both contain information about thephysiological phenomena, and, it is impossible to argue (withouta complete understanding of cellular physiology) which of thetwo observations is more relevant. Unbiased, principled approachesare therefore desirable.Surprisal analysis is one such unbiased approach. Although the

application of surprisal analysis to in vitro transcriptomic datahas been previously described in detail (8), for the purpose of thismanuscript we highlight the more relevant key elements of thisapproach as it relates to transcript expression profiles of patients.First, it is worth noting that this method has roots in both ther-modynamics and information theory and has been used exten-sively starting with the analysis of specificity and selectivity inelementary chemical reactions (9). Second, as in its application tophysical and chemical systems, the application of surprisal analysisto biological systems is built on the premise that these systems willtend toward a state of maximal entropy within the bounds imposedby one or more constraints. In biological systems, the constraintsare not currently known but can be thought of as cellular designfeatures, ultimately encoded within the genome that providebounds on molecular phenotypes. The system is thus in a state of

Significance

The study of mRNA and microRNA (miRNA) expression profilesof cells and tissue has become a major tool for therapeuticdevelopment. The results of such experiments are expected tochange the methods used in the diagnosis and prognosis ofdisease. We introduce surprisal analysis, an information-theo-retic approach grounded in thermodynamics, to compactlytransform the information acquired from microarray studiesinto applicable knowledge about the cancer phenotypic state.The analysis of mRNA and miRNA expression data from ovarianserous carcinoma, prostate adenocarcinoma, breast invasivecarcinoma, and lung adenocarcinoma cancer patients and organ-specific control patients identifies cancer-specific signatures. Weexperimentally examine these signatures and their respectivenetworks as possible therapeutic targets for cancer in single-cell experiments.

Author contributions: F.R. and R.D.L. designed research; S.Z. designed and performed theexperiments; S.Z., F.R., and R.D.L. performed research; S.Z., F.R., and R.D.L. analyzed data;and S.Z., F.R., and R.D.L. wrote the paper.

The authors declare no conflict of interest.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1316991110/-/DCSupplemental.

19160–19165 | PNAS | November 19, 2013 | vol. 110 | no. 47 www.pnas.org/cgi/doi/10.1073/pnas.1316991110

Dow

nloa

ded

by g

uest

on

Dec

embe

r 11

, 202

0

Page 2: miRNA and mRNA cancer signatures determined by analysis of ... · Toward identifying a cancer-specific gene signature we applied surprisal analysis to the RNAs expression behavior

constrained equilibrium. It is not a true equilibrium because ofthe presence of constraints. Just as in its application to physicaland chemical systems, even one constraint can enforce a state ofconstrained equilibrium that is qualitatively distinct from otherpossible states of the biological system. In the case of tran-scriptomic data, such molecular constraints are in essence phe-notypes that can each be represented by a certain profile oftranscript abundance. Constraints are not static and can changein response to genetic and environmental perturbations. By in-voking the premise of maximal entropy stated above, any changeto the constraints will be followed by a move of the system towarda new state of constrained equilibrium. At the molecular levelthese changes of state will be reflected in differential abundance oftranscripts following systems response to environmental stimuli.Further, surprisal analysis defines a special state termed “the

balance state.” Biologically, this corresponds to a state in whichtranscript levels are not changing, within the bounds of the ex-perimental time course. In surprisal analysis this state is directlyinferred from the experimental data. By analogy to thermody-namic equilibrium states of well-defined physical systems, we canassign a thermodynamic weight for each transcript from itsabundance in the balance state. Abundant transcripts contributemore weight than those expressed at low levels. So defined, thebalance state serves as an internal thermodynamic referencepoint against which measured transcript abundance can be com-pared. Differences between the measured transcript abundanceand the inferred abundance in the balance state reflect the in-fluence of one or more constraints. In addition to inferring

a balance state, surprisal analysis simultaneously determines aparsimonious set of transcriptional phenotypes or patterns andassociated weights that can be used to mathematically relatemeasured transcript abundance to balance state values to withinthe accuracy allowed by experimental noise (10). The result is amathematical representation and compaction of the experimentaldata that can be used, not only in the context of data analysis butalso to make predictions regarding how targeted perturbationsto the system might influence the system. Examples of previoussuch applications include the prediction of the effect of a limitedperturbation such as the addition of a drug (11) or of the workrequired to drive the cell to a given phenotype (12).In summary, a key component of surprisal analysis is that it

enables the ranking of the transcripts according to their abun-dance and also a characterization of their fold change. Thesetwo contributions are the balance state and the deviations fromit, respectively.In this paper, we apply surprisal analysis to the coordinated

analysis of messenger RNA (mRNA) and microRNA (miRNA)expression data of cancer patients from ovarian serous carci-noma, prostate adenocarcinoma, breast invasive carcinoma, andlung adenocarcinoma patients and to the expression profiles oftissue-specific, noncancer controls. We identified a cancer-specifictranscriptional pattern phenotype that we call a disease signature.The cancer-specific signature consists of about a hundred genesthat have a significant contribution to the cancer state and thatreliably distinguishes between noncancer and cancer patients. Thetop most deviations from the balance state are found to be from

Fig. 1. The cancer-specific mRNA signature distinguishes diseased and healthy patients. (A) The heat map shows the twenty most up-regulated and twentymost down-regulated mRNAs in four different carcinomas. Each row in the heat map is a different gene. Each vertical column in the heat map shows thedisease signature for a particular patient. There are 8 healthy and 13 diseased patients shown for ovarian cancer, 37 healthy and 140 diseased patients forprostate cancer, 20 healthy and 20 diseased patients for breast cancer, and 15 healthy and 15 diseased patients for lung cancer. For all cancers and fora sample of 268 patients, an mRNA that is up-regulated in diseased patients is found to be down-regulated in healthy patients and vice versa. Similarthermodynamic behavior is observed across patients; however, the greatest patient-specific variability in the cancer state came from prostate patients. (B) Theplots show the first and dominant Lagrange multiplier, λ1ðnÞ, as described in Eq. 1, plotted versus the same scale of patient index, n, as the heat map in A. Thesign of λ1ðnÞ is opposite for healthy and diseased patients, thereby providing a disease signature.

Zadran et al. PNAS | November 19, 2013 | vol. 110 | no. 47 | 19161

SYST

EMSBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Dec

embe

r 11

, 202

0

Page 3: miRNA and mRNA cancer signatures determined by analysis of ... · Toward identifying a cancer-specific gene signature we applied surprisal analysis to the RNAs expression behavior

20 genes affiliated with the biological hallmarks of the cancermachinery. Using the Gene Ontology tool (13) we determinethat these RNAs contribute to the regulating of the cellularcancer-signaling pathways.In addition to uncovering previously undescribed mRNA and

miRNA therapeutic targets we suggest that the disease signaturealso reveals modulations of the connectivity in miRNA and mRNAnetworks in human cancers. Furthermore, we uncover correlativebehavior between the miRNAs and the cancer-specific mRNAsignatures; mRNAs present in a cancer-specific mRNA signaturedisplay complimentary conserved binding sites specific to miRNAspresent in the cancer-specific miRNA signatures. To validate theselast findings we designed gene-silencing experiments where wedemonstrate experimentally that modifying even a single biomol-ecule connection in the cancer-specific gene network reverses thecancer state phenotype, reducing cancer cell proliferation andaltering cancer cell physiology in vitro and on a single-cell level.

ResultsSurprisal Analysis. The results are based on surprisal analysis (8,14) as given in Eq. 1. The essence is that we identify a balancedstate level, and the measured expression level of mRNA (ormiRNA) i for patient n, XiðnÞ, is expressed with respect to thebalanced state as

surprisaliðnÞ= − ln�XiðnÞ=Xo

i ðnÞ�= λ1ðnÞGi1 + residual: [1]

Here Xoi ðnÞ is the inherent reference value, the level of RNA i in

the balance state. Thermodynamically, the balanced state is a ref-erence common to all phenotypes and all patients. As furtherdiscussed in the next section and in SI Appendix, section II,ln Xo

i ðnÞ is the thermodynamic weight of RNA i in patient n.In the next paragraph we devote considerable attention to vali-dating this thermodynamic concept in cells of human patientsand specifically showing that the reference value as inferred frommeasured levels does not significantly vary with the patient indexn. It is convenient to write the expression level in the balancestate as ln Xo

i ðnÞ = − λ0ðnÞGi0 .In Eq. 1, λ1ðnÞ is the potential of patient n in the dominant

phenotype. Patient variability for a given phenotype is expressedby the dependence of λ1ðnÞ on the patient index n. The limitedpatient variability is shown by the variation between differentcolumns in the heat maps shown in Figs. 1 and 2. Gi1 is thecontribution of mRNA (or miRNA) i to the dominant pheno-type. The dependence on the index i is shown by the variationbetween different rows in the heat maps shown in Figs. 1 and 2.The residual in Eq. 1 can be the contribution of ongoing sec-ondary processes and may also be corrupted by experimental

Fig. 2. The cancer-specific miRNA signature distinguishes diseased and healthy patients. (A) Heat maps showing in the vertical direction the wide gapbetween healthy and diseased patients and, in the horizontal direction, the patient variability. Shown for the twenty most up-regulated and the twenty mostdown-regulated miRNAs in four carcinomas. The labeling of miRNA has been shortened to the number following hsa-mir. In the case of the ovarian data,UL70-3p should be preceded by hcmv-miR, 19-3p should be preceded by ebv-mir-BART, and K12-3 should be preceded by kshv-mir . Each column representsa healthy or a diseased patient. There are 8 healthy and 8 diseased patients for the ovarian data, 10 healthy and 10 diseased patients for the prostate data (SIAppendix, group 2 of Table S12), 17 healthy and 17 diseased patients for the breast data, and 35 healthy and 58 diseased patients for the lung data shown. (B)Quantitative representation of the distinction between healthy and diseased patients. Shown is the multiplier λ1ðnÞ (Eq. 1) for the cancer signature versus thepatient index n. Also in this way of representing the data it is clear that patient variability in the values of λ1ðnÞ is moderate compared with the change in signbetween the healthy and diseased patients. Both healthy and disease signatures are deviations in opposite direction from the balance state.

19162 | www.pnas.org/cgi/doi/10.1073/pnas.1316991110 Zadran et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 11

, 202

0

Page 4: miRNA and mRNA cancer signatures determined by analysis of ... · Toward identifying a cancer-specific gene signature we applied surprisal analysis to the RNAs expression behavior

noise (10) in which case just the dominant phenotype suffices toaccount for the data to within its known accuracy.

Balance State in Human Cancers. Surprisal analysis, Eq. 1, examinesthe observed expression level of an RNA in comparison witha reference value, the level in the balance state, Xo

i ðnÞ. Thereference value is a characteristic of the RNA and the networksit is involved in. In the balance state the cellular processes arebalanced, and therefore there is no net change, as previouslydescribed in refs. 8, 14, and 15. As shown in the SI Appendix,section II, the theory requires that the balance state is commonto all of the patients. As discussed above, this requirement isintentionally not imposed a priori in the surprisal analysis of thedata, because in the balance state the logarithm of the expressionlevels of the RNA i carry a patient index n, ln Xo

i ðnÞ. We showthat despite patient variability the balance state is common toboth healthy and diseased patients within a dataset for a partic-ular carcinoma (SI Appendix, Fig. S1). This behavior was alsosimilar across different datasets (Materials and Methods). Next,subject to tissue-specific expression of certain RNAs, the balancestate is shown to be common across the different carcinomas (SIAppendix, Fig. S1 and Tables S1–S10). We anticipate that slightvariability (≤10%) between diseased patients can also arise fromvarying amounts of necrotic or dead cells in the heterogeneoustumor samples before microarray processing (SI Appendix, TableS10). We have previously investigated the role of experimentalnoise by examining the balance state at different points in timeduring carcinogenesis of a homogenous cell population in vitro(8, 10).To compare the balance states of different cancers we rank the

RNAs in each according to their thermodynamic stability wherethe more stable are the lower in free energy. As in cell lines (8,14), we find that in humans too the most stable RNAs are themost highly expressed in the balanced state. These are mRNAsand miRNAs involved in networks maintaining cellular homeo-stasis (SI Appendix, Tables S1–S4). We further confirm thefunction of biomolecules that dominate the balance state to beheavily involved in the regulation of cellular homeostasis viaconducting gene ontology enrichment analysis (SI Appendix,Figs. S4–S7), using the scheme of ref. 13.There is an almost complete overlap in the top 10 miRNAs

that contribute the greatest free energy to the balance stateacross cancers (SI Appendix, Table S5). There is also significantoverlap of mRNAs (SI Appendix, Table S5). Both lung patientsand prostate patients have TPT1 and ACTG1, genes involved incell motility and cell structure, common to their balance state,whereas both breast and ovarian patients share RGS1, a geneinvolved in GTPase activity and signaling, in their balance state(16). We also identify the top 100 mRNAs and miRNAs with thegreatest contribution of free energy to the balance state and findgreater overlap of mRNAs and miRNAs across patients (SIAppendix, Tables S6–S9).

Cancer-Specific Signature. Surprisal analysis resolves the deviationof the (logarithm of the) measured expression level from thevalue of the balanced state. In this paper we highlight thedominant contribution to this deviation as shown in Eq. 1 and SIAppendix, section III. For all four cancers and for both mRNAand miRNA expression-level data, we find that surprisal analysisidentifies the largest deviation from the balance state as specificto the cancer phenotypic state (SI Appendix, Fig. S8). This sig-nature robustly distinguishes healthy patients from cancerpatients (Figs. 1A and 2A and SI Appendix, Fig. S9). The analysisalso identifies the thermodynamic potential of each patient,denoted as λ1ðnÞ for patient number n (Eq. 1 and SI Appendix,sections II–V). The subscript 1 is the ranking of the deviationand refers to the cancer signature. The potential λ1ðnÞ does varyslightly between different patients in the same disease (Figs. 1Band 2B). As seen, the range of this variation is typically signifi-cantly smaller than the difference in the potential between dis-eased and healthy patients. Surprisal analysis also ranks thebiomolecules that contribute to programming a system awayfrom cellular homeostasis as defined by the balance state towardthe cancer phenotypic state. In Eq. 1, Gi1 is the weight of RNA iin the cancer phenotype. Here too, 1 signifies the leading phe-notype. It is a result of the thermodynamic analysis that theproduct λ1ðnÞGi1 of the potential λ1ðnÞ by the weight of the RNAi is the work needed to reprogram to the cancer phenotype inpatient index n. Surprisal analysis identifies those miRNAs ormRNA that are the most up-regulated and those that are mostdown-regulated in the disease signature. The disease pattern isdefined relative to the thermodynamic balance state. The healthypatients are, on the average, as deviant from the balance state asthe diseased patients but in an opposite direction (see Fig. 4).We observe miRNAs and mRNA that are most up-regulated inthe healthy to be significantly down-regulated in the diseased andvice versa (SI Appendix, Tables S13, S15, S17, and S19).

Cancer-Specific mRNA Signature. We identified a mRNA signaturefor prostate, ovarian, lung, and breast carcinomas (SI Appendix,section IV and Tables S12, S14, S16, and S18). Here we discussthe major biological implications. Cancer is characterized byuncontrolled cell growth and proliferation (17). As expected,biomolecules involved in regulating cell growth and differentiationappear to contribute the greatest free energy to reprogramminga system from the balance state and are most present in thecancer-specific gene signature. Genes commonly associated withthe six hallmarks of cancer, including resisting cell death, angio-genesis, and metastasis, are also present in the cancer-specificsignature (18). It is well known that the underlying mechanism inmaintaining these cancer hallmarks is genome instability (19). Wepreviously demonstrated alterations in gene expression in vitro aremaintained through subsequent cell divisions (8, 14, 15). Addi-tionally, in the thermodynamic gene signature of lung adenocar-cinoma patients, not only are genes involved in cancer oncogenesis

Fig. 3. The disease signature accounts for the ex-pression level of both miRNAs and mRNAs. Unlikethe balance state alone, including the disease sig-nature as in Eq. 1 provides an accurate descriptionof the expression level of both miRNAs and mRNAsof any particular patient over several orders ofmagnitude of the data. Shown is a scatter plot ofRNAseq-measured miRNA data represented interms of surprisal analysis graphed versus the actualraw data for a prostate diseased patient and fora lung-healthy patient.

Zadran et al. PNAS | November 19, 2013 | vol. 110 | no. 47 | 19163

SYST

EMSBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Dec

embe

r 11

, 202

0

Page 5: miRNA and mRNA cancer signatures determined by analysis of ... · Toward identifying a cancer-specific gene signature we applied surprisal analysis to the RNAs expression behavior

and cancer cell metabolism present in the diseased signature, suchas FTH1 and FTl (20), respectively, but also genes specific to lungcancer response to inflammation; genes critical for the presenta-tion of foreign antigens to the immune system were also detected(Fig. 1A and SI Appendix, Table S12). We identify a mRNA dis-ease signature that distinguishes lung adenocarcinoma, breast in-vasive carcinoma, ovarian serous cytadenocarcinoma, and prostateadendocarcimona from normal tissue. However, slight variabilityin the thermodynamic mRNA signatures for these cancers acrosspatients was detected. In prostate adenocarcinomas patients, weanalyzed different groups of patients chosen randomly (Materialsand Methods) and identified a consistent set of genes in the cancerthermodynamic signature of each group (SI Appendix, Table S10).However, genes present in the signature of some patient groupsare not observed in others, and this may arise since only a subsetof an organism’s genomic DNA is transcribed into mRNAs at anygiven cell cycle (21).

Cancer-Specific miRNA Signature. Global microRNAome profilingdemonstrated drastic changes in the expression of multiplemiRNAs in many common human cancers (22). Several studieshave suggested that miRNA expression behavior can providea more accurate method of classifying cancer subtypes thantranscriptome profiling of an entire set of known protein-codinggenes (23). Differential miRNA expression behavior has been ableto successfully classify poorly differentiated cancers (24), whereasmRNA gene expression behavior failed to classify them (25).

Surprisal analysis of breast cancer patients implicates miR-141in the thermodynamic miRNA signature (SI Appendix, TableS15). MiR-141 was found to be significantly hormone regulatedin breast cancer cells and is also reported to be down-regulatedin breast cancer stem cells. Loss of miR-141 plays a role intriggering expansion of the stem cell population in breast tumors(26). Mir-206 was implicated in the thermodynamic miRNAsignature of lung cancer patients (SI Appendix, Table S13), andstudies have only recently indicated miR-206 in both invasionand metastasis of lung cancer (27).

Quantitative Signature of Cancer RNA Expression Levels. The diseasesignature, as characterized by surprisal analysis, quantifies themajor fold change in the expression level of both miRNAs andmRNAs (Fig. 3). At the same time, Fig. 3 shows rather clearlythat the disease is only a perturbation of the balance state. Theimplications for network connectivity are discussed below. An-other quantitative aspect, shown graphically in Figs. 1 and 2, isthat those RNAs most up-regulated in diseased patients arethose most down-regulated in healthy patients and vice versa.

Cancer mRNA and miRNA Networks. As for mRNAs, the effect ofmiRNAs on cell pathology and physiology is likely regulated notonly by individual miRNAs but also by networks of miRNAs.Through the determination of the balance state we can constructbalance state networks, as previously described (14). Here surprisalanalysis provides both the connectivity itself and the strength of theconnectivity for the mRNA and miRNA networks. We identifiedthe miRNA and mRNA contributing in the cancer state (SI Ap-pendix, Tables S13, S15, S17, and S19) and ask how the diseasealters the networks identified in the balanced state. As discussed inref. 14, the strength of a connectivity is determined by the productof the free energy of the two RNAs. Because the disease signatureis only a lesser change in the free energy, the disease modifies thestrength of the network connectivities but not the network frame-work. Additional discussion is found in SI Appendix, section V.

Modulation of a Single mRNA and miRNA Connectivity in Vitro Resultsin Deviation of Cancer Cells from the Cancer Phenotypic State. Wereport an experimental demonstration of biological constraintsgoverning the deviation of a physiological system toward either acancer diseased or healthy phenotypic state. Breast, ovarian, lung,and prostate cancer cell lines were cultured in vitro as discussedin Materials and Methods. Cells were transfected with scramble orsiRNA against the gene with the greatest thermodynamic influencein the cancer disease signature (Fig. 4). Cell proliferation rates wereobserved using the cell proliferation assay described in Materialsand Methods. Cancer cell lines treated with the siRNA showedsignificant decrease compared with scramble controls (Fig. 4A).In breast cancer cells, COL10A1 was found to contribute the

greatest thermodynamically to the cancer state. This gene isoverexpressed in many tumors, including breast and ovariantumors (28). Inhibition in vitro of COL11A1 in OVCAR ovariancancer cell lines reduced cell proliferation and altered cellstructure, with fewer cell extensions observed (Fig. 4C). In lungcancer cell lines, PAEP was targeted. PAEP is a tumor suppressoralso implicated in both breast and ovarian cancer (29). PAEP in-hibition in the adenocarcinomic human alveolar basal epithelial celllines (A549) resulted in significant reduction in cell proliferation asearly as 48 h subsequent siRNA gene knockdown (Fig. 4B).Furthermore, we targeted the gene with greatest thermodynamic

influence (DLX1) in the cancer state in human prostate carci-noma epithelial cell line (22RV1) and also inhibited a singlemiRNA in the DLX1 mRNA and miRNA connectivity, inhibit-ing hsa-miR-615. Inhibition of DLX1 reduced prostate cancercell proliferation (22RV1) in vitro (Fig. 4A). Inhibition of bothDLX1 and mir-615 not only reduced cell proliferation, but ona single-cell level this modulation resulted in the presence ofa unique cellular physiology, with an increase in cell volume andpresence of necrotic-like vacuoles, indicating the onset of cell death(Fig. 4C). A reduced presence of lamellipodia and filopodia was

Fig. 4. Modulation of the cancer state. (A) Cell proliferation assay in thepresence of scramble or siRNA. Cell counts were conducted at 1, 2, 3, and 4 d.(B) Real-time bright-field images of lung A549 and ovarian OVCAR cancercell line in the presence or absence of scramble or target siRNA 40 μM. (C)Single-cell microscopy of 22RV1 prostate cancer cells treated with or withoutscramble or target siRNA. (Scale bar, 20 μm.)

19164 | www.pnas.org/cgi/doi/10.1073/pnas.1316991110 Zadran et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 11

, 202

0

Page 6: miRNA and mRNA cancer signatures determined by analysis of ... · Toward identifying a cancer-specific gene signature we applied surprisal analysis to the RNAs expression behavior

also observed (Fig. 4C). Cellular lamellipodia and filopodia as-sembly is a key mechanism enabling cancer cell metastasis.Our cell culture experiments reveal that an inhibition of a

miRNA or an mRNA or both that are dominant in the diseasesignature deviates the cell away from the cancer phenotype.

Concluding RemarksSurprisal analysis integrates information theory and thermody-namics toward an effective information quantification and com-paction of data and provides an unbiased characterization ofsystems not in equilibrium. We applied surprisal analysis to alarge number of patient expression profiles of both miRNA andmRNA to characterize and elucidate cancer-specific signaturesand previously undescribed cancer targets and reveal connec-tivity in miRNA and mRNA networks.We identify a balance state, a robust thermodynamic reference

common to human carcinomas. Diverse patient populationsexhibit this characteristic distribution of RNAs stability. Thedominant deviation from the balanced state was identified as thecancer-specific disease pattern, a signature comprised of uniquemRNAs and miRNAs capable of distinguishing diseased patientsamples from normal controls. The balance state and the maindeviation provide a description of energetics and change over awide dynamic range.

Materials and MethodsTCGA Gene and miRNA Dataset and Patient Information. Level 3 miRNA andgene expression data were obtained from The Cancer Genome Atlas (TCGA)data portal. Both microarray and deep-sequencing datasets from ovarianserious cystadenocarcinoma, lung adenocarcinoma, prostate adenocarci-noma, and breast invasive carcinoma were acquired in August 2012.

The numbers of patients in each mRNA data file are as follows: lung,15 healthy and 15 diseased; prostate, 37 healthy and 140 diseased; andovarian, 8 healthy and 13 diseased; and breast, 10 healthy and 10 diseased. Inthe case of the miRNA data, the lung data comprise 35 healthy and 58 dis-eased and were analyzed as a single set. For the three other cancers, thelimiting number of patients is the healthy group. For the purpose of analysiswe formed sets of data. Each set consisted of healthy patients and an equalnumber of diseased patients with different diseased patients in different sets.

Signature and Network Validation. Cancer-specific gene andmiRNA signaturescapable of distinguishing diseased and healthy patients were subjected toliterary review and validation in the miRBase database and the National Centerfor Biotechnology Information database TargetScan (31) and Phenomir (32).mRNA-miRNA networks were validated by MicroCosm database. Gene On-tology Enrichment Analysis was conducted as previously described (29). Briefly,single-rank analysis was conducted, and threshold was set a P value of 10−11.

Cell Lines. Mammalian cell lines described in this study were purchased fromAmerican Type Culture Collection. Cell lines were maintained in standardconditions (33) The following cancer cell lines were used: human prostatecarcinoma epithelial cell line (22RV1), human ovarian carcinoma cell line(OVCAR), adenocarcinomic human alveolar basal epithelial cells (A549), andhuman mammary carcinosarcoma cells lines (HS578T).

Cell Proliferation Assay. Cell proliferation was monitored at 1, 2, 3, and 4 dusing a protocol described in ref. 33. At the end of each time period, the cellswere trypsinized to produce a single-cell suspension, and the viable cellnumber in each well was counted using trypan blue (Gibco) to detect viablecells and a hemocytometer (LW Scientific). All experiments in this study wereconducted under a protocol approved by the Western Institutional ReviewBoard (protocol 12-001039).

Single-Cell Microscopy. This protocol was followed as previously described indetail (33). Image analysis was performed using ImageJ.

Nanoparticle Delivery of siRNA. The protocol was followed as previously de-scribed in detail (33).

Reagents and Supplies. siRNA (PAEP, catalog no. sc-43807; COL10A1, catalogno. sc-95312; COL11A1, catalog no. sc-72956;DLX1, catalog no. sc-105301) werefrom Santa Cruz Biotechnologies. hsa-miR-615 miScript inhibitor (GGGGGUC-CCCGGUGCUCGGAUC) was from Qiagen.

Statistical Analysis of Cell Proliferation Assay. Results are shown as mean ±SEM. Student t test was used for in vitro experimental correlations. *P <0.05, **P < 0.01 was considered statistically significant.

ACKNOWLEDGMENTS. This work was supported by a Prostate Cancer Foun-dation creativity grant. F.R. is Director of Research, Fonds National de laRecherche Scientifique.

1. Rhodes DR, Chinnaiyan AM (2005) Integrative analysis of the cancer transcriptome.Nat Genet 37(Suppl):S31–S37.

2. Segal E, Friedman N, Kaminski N, Regev A, Koller D (2005) From signatures to models:Understanding cancer using microarrays. Nat Genet 37(Suppl):S38–S45.

3. Bhattacharjee A, et al. (2001) Classification of human lung carcinomas by mRNA ex-pression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA98(24):13790–13795.

4. Golub TR, et al. (1999) Molecular classification of cancer: Class discovery and classprediction by gene expression monitoring. Science 286(5439):531–537.

5. Schmid PR, Palmer NP, Kohane IS, Berger B (2012) Making sense out of massive databy going beyond differential expression. Proc Natl Acad Sci USA 109(15):5594–5599.

6. NelsonWG (2011) Profiling prostate cancer. Proc Natl Acad Sci USA 108(52):20861–20862.7. Markert EK, Mizuno H, Vazquez A, Levine AJ (2011) Molecular classification of prostate

cancer using curated expression signatures. Proc Natl Acad Sci USA 108(52):21276–21281.8. Remacle F, Kravchenko-Balasha N, Levitzki A, Levine RD (2010) Information-theoretic

analysis of phenotype changes in early stages of carcinogenesis. Proc Natl Acad SciUSA 107(22):10324–10329.

9. Levine RD, Bernstein RB (1974) Energy disposal and energy consumption in elemen-tary chemical reactions. Information theoretic approach. Acc Chem Res 7(12):393–400.

10. Gross A, Levine RD (2013) Surprisal analysis of transcripts expression levels in the presence ofnoise: A reliable determination of the onset of a tumor phenotype. PLoS ONE 8(4):e61554.

11. Shin YS, et al. (2011) Protein signaling networks from single cell fluctuations andinformation theory profiling. Biophys J 100(10):2378–2386.

12. Gross A, Li CM, Remacle F, Levine RD (2013) Free energy rhythms in Saccharomycescerevisiae: A dynamic perspective with implications for ribosomal biogenesis. Bio-chemistry 52(9):1641–1648.

13. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) GOrilla: A tool for discovery andvisualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10:48.

14. Kravchenko-Balasha N, et al. (2012) On a fundamental structure of gene networks inliving cells. Proc Natl Acad Sci USA 109(12):4702–4707.

15. Kravchenko-Balasha N, et al. (2011) Convergence of logic of cellular regulation in dif-ferent premalignant cells by an information theoretic approach. BMC Syst Biol 5:42.

16. Amson R, Pece S, Marine J-C, Di Fiore PP, Telerman A (2013) TPT1/ TCTP-regulatedpathways in phenotypic reprogramming. Trends Cell Biol 23(1):37–46.

17. Cornu M, Albert V, Hall MN (2013) mTOR in aging, metabolism, and cancer. Curr OpinGenet Dev 23(1):53–62.

18. Hainaut P, Plymoth A (2013) Targeting the hallmarks of cancer: Towards a rationalapproach to next-generation cancer therapy. Curr Opin Oncol 25(1):50–51.

19. Hanahan D, Weinberg RA (2011) Hallmarks of cancer: The next generation. Cell144(5):646–674.

20. Fisher J, et al. (2007) Ferritin: A novel mechanism for delivery of iron to the brain andother organs. Am J Physiol Cell Physiol 293(2):C641–C649.

21. Gautier EL, et al.; Immunological Genome Consortium (2012) Gene-expression pro-files and transcriptional regulatory pathways that underlie the identity and diversityof mouse tissue macrophages. Nat Immunol 13(11):1118–1128.

22. Ng EKO, et al. (2013) Circulating microRNAs as specific biomarkers for breast cancerdetection. PLoS ONE 8(1):e53141.

23. Yang Z, et al. (2012) Preferential regulation of stably expressed genes in the humangenome suggests a widespread expression buffering role of microRNAs. BMC Ge-nomics 13(Suppl 7):S14.

24. Gommans WM, Berezikov E (2012) Controlling miRNA regulation in disease. MethodsMol Biol 822:1–18.

25. Krell J, Frampton AE, Stebbing J (2013) MicroRNAs in the cancer clinic. Front Biosci(Elite Ed) 5:204–213.

26. Zhao L, et al. (2012) MiRNA expression analysis of cancer-associated fibroblasts andnormal fibroblasts in breast cancer. Int J Biochem Cell Biol 44(11):2051–2059.

27. Wang X, Ling C, Bai Y, Zhao J (2011) MicroRNA-206 is associated with invasion andmetastasis of lung cancer. Anatom Record 294(1):88–92.

28. Chang HJ, et al. (2009) MMP13 is potentially a new tumor marker for breast cancerdiagnosis. Oncol Rep 22(5):1119–1127.

29. Kunert-Keil C, Steinmüller F, Jeschke U, Gredes T, Gedrange T (2011) Immunolocalizationof glycodelin in human adenocarcinoma of the lung, squamous cell carcinoma of thelung and lung metastases of colonic adenocarcinoma. Acta Histochem 113(8):798–802.

30. Leber MF, Efferth T (2009) Molecular principles of cancer invasion and metastasis(review). Int J Oncol 34(4):881–895.

31. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction ofmammalian microRNA targets. Cell 115(7):787–798.

32. Ruepp A, Kowarsch A, Theis F (2012) PhenomiR: microRNAs in human diseases andbiological processes. Methods Mol Biol 822:249–260.

33. Zadran S, Amighi A, Otiniano E, Wong K, Zadran H (2012) ENTPD5-mediated modu-lation of ATP results in altered metabolism and decreased survival in gliomablastomamultiforme. Tumour Biol 33(6):2411–2421.

Zadran et al. PNAS | November 19, 2013 | vol. 110 | no. 47 | 19165

SYST

EMSBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Dec

embe

r 11

, 202

0