mrnas, proteins and the emerging principles of gene ......gene expression has been defined as the...

15
Ever since the concept of genes as discrete heritable units emerged from the experiments of Gregor Mendel, the question of how an organism’s genotype leads to its phenotype has fascinated biologists 1 . Early observations noted that some genes affect an organism’s pheno- type only under specific environmental conditions 2,3 . In other words, genes have to be ‘expressed’ before their phenotypes become apparent 4 . Gene expression is a complex process that is regulated at many levels (FIG. 1). Transcription is controlled by transcription factors, epigenetic marks and chromatin topology 5,6 . mRNA processing (for example, splicing, polyadenylation and modifications), transport and degradation are regulated by RNA-binding proteins and non-coding RNAs 7,8 . Protein translation is itself a complex multistep process that is subject to extensive regulation at the levels of initiation, elongation, localization and ribosome composition 911 . Protein degradation is also highly regulated, with the ubiquitin–proteasome system and autophagy play- ing critical roles in health and disease 12,13 . Finally, post-translational modifications of proteins, their inter- action with other proteins (and other biomolecules) and their catalytic activity give rise to phenotypes 14,15 . Gene expression has been defined as the “produc- tion of an observable phenotype by a gene — usually by directing the synthesis of a protein” 16 . By contrast, most gene expression studies exclusively report mRNA data. The term ‘gene expression’ is often used synony- mously with mRNA measurements. Furthermore, often for simplicity, most studies presume that single genes associate with individual gene products, whereas in reality, individual genes can ultimately give rise to a range of transcripts and proteoforms 17,18 . However, neither mRNAs nor proteins are ‘expressed’. Rather, both mRNAs and proteins are intermediates in gene expression that can provide useful readouts connecting genes and phenotypes. The advent of omics technologies now allows researchers to study gene expression at var- ious levels. At the genomic level are chromatin immu- noprecipitation followed by sequencing (ChIP–seq), assay for transposase-accessible chromatin sequencing (ATAC-seq) and chromosome conformation capture 19,20 . The transcriptome can be profiled by RNA sequencing (RNA-seq) 21 , and the proteome can be profiled by mass spectrometry-based proteomics 22 . Furthermore, the dynamic turnover of mRNAs and proteins can be studied with a variety of methods (BOX 1). Yet, how useful are mRNA levels for predicting pro- tein levels? Are there principles of gene expression con- trol that can be derived from their comparison? And how valuable are mRNA-level and/or protein-level measure- ments for understanding and predicting phenotypes? These questions are the subject of an intense ongoing discussion that is also biased by preconceptions: pro- teomics researchers argue that proteins are closer to phenotypes and thus more relevant functionally, while genomics researchers point out that obtaining (and analysing) mRNA-level data is easier and has proven to yield meaningful results. The primary literature on the relationship between mRNAs and proteins is extensive and sometimes con- flicting. Rather than reviewing this in detail, we concen- trate on the underlying principles with examples from the recent literature. We primarily focus on studies using RNA-seq and liquid chromatography coupled with mass spectrometry unless otherwise stated. For earlier studies we refer readers to excellent reviews on this topic 23,24 . We begin with an overview of experimentally observed mRNA–protein correlations across genes and within genes and highlight technical challenges involved in Genotype The complement of DNA possessed by an organism. Phenotype The observable characteristics of an organism, which results from the genotype and its interaction with the environment. Non-coding RNAs RNAs transcribed from the genome that do not serve as templates for proteins (for example, microRNAs). Ubiquitin–proteasome system An ancient system conserved across species that is responsible for the regulated catabolism of individual proteins. Classes of enzymes (E1, E2 and E3) function in specifically ubiquitylating proteins within the cell, thus targeting these clients for destruction by the proteasome. mRNAs, proteins and the emerging principles of gene expression control Christopher Buccitelli 1 and Matthias Selbach 1,2 Abstract | Gene expression involves transcription, translation and the turnover of mRNAs and proteins. The degree to which protein abundances scale with mRNA levels and the implications in cases where this dependency breaks down remain an intensely debated topic. Here we review recent mRNA–protein correlation studies in the light of the quantitative parameters of the gene expression pathway, contextual confounders and buffering mechanisms. Although protein and mRNA levels typically show reasonable correlation, we describe how transcriptomics and proteomics provide useful non-redundant readouts. Integrating both types of data can reveal exciting biology and is an essential step in refining our understanding of the principles of gene expression control. 1 Proteome Dynamics, Max Delbrück Center for Molecular Medicine, Berlin, Germany. 2 Charité — Universitätsmedizin Berlin, Berlin, Germany. e-mail: matthias.selbach@ mdc-berlin.de https://doi.org/10.1038/ s41576-020-0258-4 www.nature.com/nrg REVIEWS 630 | OCTOBER 2020 | VOLUME 21

Upload: others

Post on 21-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

mRNAs, proteins and the emerging principles of gene expression controlEver since the concept of genes as discrete heritable units emerged from the experiments of Gregor Mendel, the question of how an organism’s genotype leads to its phenotype has fascinated biologists1. Early observations noted that some genes affect an organism’s pheno­ type only under specific environmental conditions2,3. In other words, genes have to be ‘expressed’ before their phenotypes become apparent4. Gene expression is a complex process that is regulated at many levels (Fig. 1). Transcription is controlled by transcription factors, epigenetic marks and chromatin topology5,6. mRNA processing (for example, splicing, polyadenylation and modifications), transport and degradation are regulated by RNA­binding proteins and non-coding RNAs7,8. Protein translation is itself a complex multistep process that is subject to extensive regulation at the levels of initiation, elongation, localization and ribosome composition9–11. Protein degradation is also highly regulated, with the ubiquitin–proteasome system and autophagy play­ ing critical roles in health and disease12,13. Finally, post­translational modifications of proteins, their inter­ action with other proteins (and other biomolecules) and their catalytic activity give rise to phenotypes14,15.
Gene expression has been defined as the “produc­ tion of an observable phenotype by a gene — usually by directing the synthesis of a protein”16. By contrast, most gene expression studies exclusively report mRNA data. The term ‘gene expression’ is often used synony­ mously with mRNA measurements. Furthermore, often for simplicity, most studies presume that single genes associate with individual gene products, whereas in reality, individual genes can ultimately give rise to a range of transcripts and proteoforms17,18. However, neither mRNAs nor proteins are ‘expressed’. Rather, both mRNAs and proteins are intermediates in gene
expression that can provide useful readouts connecting genes and phenotypes. The advent of omics technologies now allows researchers to study gene expression at var­ ious levels. At the genomic level are chromatin immu­ noprecipitation followed by sequencing (ChIP–seq), assay for transposase­accessible chromatin sequencing (ATAC­seq) and chromosome conformation capture19,20. The transcriptome can be profiled by RNA sequencing (RNA­seq)21, and the proteome can be profiled by mass spectrometry­based proteomics22. Furthermore, the dynamic turnover of mRNAs and proteins can be studied with a variety of methods (Box 1).
Yet, how useful are mRNA levels for predicting pro­ tein levels? Are there principles of gene expression con­ trol that can be derived from their comparison? And how valuable are mRNA­level and/or protein­level measure­ ments for understanding and predicting pheno types? These questions are the subject of an intense ongoing discussion that is also biased by preconceptions: pro­ teomics researchers argue that proteins are closer to phenotypes and thus more relevant functionally, while genomics researchers point out that obtaining (and analysing) mRNA­level data is easier and has proven to yield meaningful results.
The primary literature on the relationship between mRNAs and proteins is extensive and sometimes con­ flicting. Rather than reviewing this in detail, we concen­ trate on the underlying principles with examples from the recent literature. We primarily focus on studies using RNA­seq and liquid chromatography coupled with mass spectrometry unless otherwise stated. For earlier studies we refer readers to excellent reviews on this topic23,24. We begin with an overview of experimentally observed mRNA–protein correlations across genes and within genes and highlight technical challenges involved in
Genotype The complement of DNA possessed by an organism.
Phenotype The observable characteristics of an organism, which results from the genotype and its interaction with the environment.
Non-coding RNAs RNAs transcribed from the genome that do not serve as templates for proteins (for example, microRNAs).
Ubiquitin–proteasome system An ancient system conserved across species that is responsible for the regulated catabolism of individual proteins. Classes of enzymes (E1, E2 and E3) function in specifically ubiquitylating proteins within the cell, thus targeting these clients for destruction by the proteasome.
mRNAs, proteins and the emerging principles of gene expression control Christopher Buccitelli 1 and Matthias Selbach 1,2
Abstract | Gene expression involves transcription, translation and the turnover of mRNAs and proteins. The degree to which protein abundances scale with mRNA levels and the implications in cases where this dependency breaks down remain an intensely debated topic. Here we review recent mRNA–protein correlation studies in the light of the quantitative parameters of the gene expression pathway, contextual confounders and buffering mechanisms. Although protein and mRNA levels typically show reasonable correlation, we describe how transcriptomics and proteomics provide useful non-redundant readouts. Integrating both types of data can reveal exciting biology and is an essential step in refining our understanding of the principles of gene expression control.
1Proteome Dynamics, Max Delbrück Center for Molecular Medicine, Berlin, Germany. 2Charité — Universitätsmedizin Berlin, Berlin, Germany.
e-mail: matthias.selbach@ mdc-berlin.de
630 | OctOber 2020 | vOlume 21
quantifying them. Next, we highlight key features of the gene expression pathway, contextual confounders and buffering mechanisms and how they affect the relation­ ship between mRNAs and proteins. We conclude with a pragmatic perspective on the value of mRNA and protein­level measurements.
Correlation between mRNA and protein levels The relationship between mRNA­level and protein­level data is typically investigated by studying mRNA– protein correlation (Box  2), with imperfect corre­ lations being thought to arise due to technical (for example, measurement error) or biological (for exam­ ple, post­transcriptional regulation) reasons. As has been pointed out before24–26, it is crucial to distinguish between two different types of mRNA–protein corre­ lations: analyses of across­gene correlations investigate how much the absolute abundances of mRNAs corre­ late with the absolute abundances of corresponding proteins across the different genes under a given con­ dition (Fig. 2a,b). Analyses of within­gene correlations investigate how much the changes in the mRNA levels of one gene across conditions can explain the changes in corresponding protein levels (Fig. 2c,d).
Across-gene correlations. Across­gene correlation analy­ ses have demonstrated significant correlations between mRNA and protein abundances in all kingdoms of life (Fig. 2b). Typical Pearson correlation coefficients (r) for mammalian tissues are about 0.6, which means that ~40% of the variability in protein levels can be explained by the variability in mRNA levels (Box 2). As both mRNA and protein measurements are affected by measure­ ment noise and biases (see later), the true correlations are expected to be higher. That being said, early studies noted that low or biased proteome coverage could also lead to overestimation of the true across­gene correla­ tion (Fig. 2b, column i)27. Moreover, studies across tissues and cell lines demonstrated that across­gene mRNA– protein correlations differ between samples. For exam­ ple, targeted proteomics on a subset of proteins across cell lines and tissues yielded r values ranging from 0.39 to 0.79 (Fig. 2b, column iii)28. This implies that although technical limitations account for some downward devi­ ation from the true across­gene mRNA–protein correla­ tion, biological factors are also important. In summary,
across­gene analyses typically reveal a substantial corre­ lation between mRNA and protein abundances, with the degree of correlation differing across studies.
Within-gene correlations. Within­gene correlation stud­ ies look at how a single gene’s protein levels track with its mRNA levels across multiple samples (for example, tissues or cell lines). For example, proteogenomic pro­ filing of cancer samples yields mRNA and protein cor­ relation profiles for thousands of genes across dozens of tumours29–35. Similar analyses can also be performed across tissue­matched samples from many healthy individuals where the primary source of mRNA var­ iation (and thus protein variation) is germline, rather than somatic, genetic variation36–38. Furthermore, within­gene correlations can be used to investigate mRNA–protein covariation across different tissues39. In all of these studies, the correlation coefficients dif­ fer widely between genes, with most proteins showing positive correlation with their mRNAs (Fig. 2d). Specific functional groups such as ribosomal and spliceosome proteins often (although not always) show particularly low correlation with their mRNAs. Conversely, genes involved in core metabolic pathways tend to display the highest within­gene correlations. Note that proteins or mRNAs with similar abundances across conditions (that is, small fold changes) will naturally show low within­gene mRNA–protein correlations. Hence, poor within­gene correlations do not necessarily arise from interesting biological functions of the genes themselves. In summary, within­gene correlation studies typically reveal significant but modest correlations between changes in mRNA and protein levels and identify functional sets of genes that often tend to show low or high correlation.
Technical limitations. Genome­wide mRNA and pro­ tein measurements depend on RNA­seq and mass spectrometry­based proteomics, two technologies that have recently been reviewed in detail21,22. These meas­ urements suffer from noise and biases that affect both precision and accuracy, respectively.
Noise results in mRNA/protein measurement repli­ cates (either technical or biological) not yielding identical values and affects both across­gene and within­gene cor­ relations. Noise is particularly relevant for low­abundance
Ubiquitylation
(…) A
Interactome M
TranscriptomeGenome Proteome Phenotype
Fig. 1 | Overview of the gene expression pathway. Overview of the initial steps in gene expression connecting phenotype to genotype. The processes at different steps in the gene expression pathway that confer regulatory control are indicated at the bottom. miRNA, microRNA.
Autophagy A form of cellular catabolism that is responsible for the removal of large cellular components (for instance, protein aggregates or damaged organelles). Autophagy involves the inclusion of these components within double-membraned vesicles (termed ‘autophagosomes’), which then undergo fusion with lysosomes.
Omics technologies A generic term referring to the multitude of technologies to systematically measure multiple biological molecules simultaneously.
Precision The closeness of repeated measurements to each other. High precision means that measurements are highly reproducible.
Accuracy The closeness of the average of repeated measurements to the true value. High accuracy means that the measurements are on average in good agreement with the true quantity.
NAture revIeWS | GeNeTiCs
vOlume 21 | OctOber 2020 | 631
gene products, whose signal­to­noise ratio is especially sensitive to it. Another factor is that quantification meth­ ods markedly differ in their precision. Discussion of the strengths and weaknesses of different proteomic quan­ tification methods is beyond the scope of this Review, and we instead refer the interested reader to an excellent recent review40. An important general point is that stable isotope­based protein quantification methods are typi­ cally more precise than label­free quantification methods in part due to their advantage in multiplexing multiple samples onto single mass spectrometry runs, thereby reducing technical variability. Irrespective of the specific details of the methods used, technical and biological noise and the extent to which it affects the data can be estimated effectively by analysing multiple technical and biological replicates.
Bias leads to measured values of specific mRNAs or proteins being systematically overestimated or
underestimated. Biases are often gene­product spe­ cific and thus affect across­gene correlations (compar­ ing quantifications with different biases) more than within­gene correlations (comparing quantifications with similar biases). For example, biases in RNA data can arise from any of the steps involved in preparing the sequencing library21,41. GC and amplification biases have been extensively addressed either computationally (for example, gC correction) or technically (for exam­ ple, unique molecular identifiers in low­complexity or single­cell samples)42,43. For proteins, the most widely used shotgun proteomics approach involves digest­ ing the proteins into peptides before analysis by mass spectrometry22. A protein’s abundance measurement is therefore biased by the observability of its respective peptides40, which is influenced by a multitude of com­ plex, somewhat interdependent factors arising from their physicochemical properties. These factors include
Box 1 | Clocking mRNA and protein turnover
In addition to measuring steady-state abundances, proteomic and transcriptomic methods can also quantify the turnover of mrNAs and proteins. metabolic pulse labelling with modified nucleosides and with amino acids emerged as a powerful techniques to study mrNA and protein turnover. For mrNAs, thiolated nucleosides such as 4-thiouridine (4su) can be metabolically incorporated into newly synthesized mrNA molecules (see the figure). mrNAs containing the 4su label can be isolated bio chemically, which enables separate analysis of newly synthesized and pre-existing mrNAs by rNA sequencing (rNA-seq) to compute mrNA half-lives47,158. Alternatively, the incorporation of thiolated nucleosides can be analysed via ‘SH-linked alkylation for the metabolic sequencing of rNA’ (SlAm-seq) to quantify transcriptional output or rNA stability159.
Protein turnover can be quantified via stable isotope labelling by amino acids in cell culture (SIlAc)160 or chemical labelling and affinity purification using azidohomoalanine (see the figure). Whereas pulsed SIlAc (pSIlAc) quantifies protein synthesis161,162, dynamic SIlAc47,163 measures protein degradation. Specific SIlAc-based labelling schemes measure synthesis and degradation in parallel149,164, and these methods were recently combined with chemical heavy stable isotope labelling for multiplexing64,165. pSIlAc can also be combined with bio-orthogonal amino acid tagging166 to study protein synthesis and degradation kinetics with increased temporal resolution117,167. The most typical amino acids to be labelled, due to trypsin often being the protease of choice for generating peptide digests, are arginine and lysine (see the figure; asterisks represent positions of isotopically divergent atoms).
the metabolic labelling methods are mostly limited to tissue culture experiments. An alternative readout for translation is ribosome profiling (ribo-seq), which involves the collection of mrNA fragments bound by ribosomes168. Assuming that the average translation elongation rate is similar for different genes, ribo-seq provides an indirect readout for protein synthesis. Dividing ribo-seq reads by transcript amounts yields so-called translational efficiencies of different transcripts; that is, a measure of how efficiently some rNA species are at being translated versus others. Ribo-seq is advantageous because it is applicable to in vivo samples, such as human biopsy samples, and provides a nucleotide-level readout for translation, which can define previously undetected protein variants169,170. However, due to biological factors such as different translation elongation speed, the amino acid composition of the nascent peptide chain and ribosome queuing171,172, as well as technical issues, such as the use of cycloheximide and data normalization issues173,174, ribosome densities as measured by ribo-seq do not necessarily reflect the protein output of the ribosome172,175.
Methionine Azidohomoalanine
OH S
Endogenous structure Metabolic label
GC correction A computational approach used to account for sequencing depth biases due to the guanine/cytosine composition of a particular region of the genome.
Shotgun proteomics in ‘shotgun’ or ‘bottom-up’ proteomics, the proteins in a sample are cleaved into peptides before being analysed by mass spectrometry. Peptides are simpler than proteins, which facilitates their analysis by mass spectrometry and makes the shotgun approach particularly popular.
www.nature.com/nrg
632 | OctOber 2020 | vOlume 21
their behaviour across sample preparation and chroma­ tographic fractionation steps as well as their ionization efficiency. Consequently, while measured peptide ion intensities, and any source protein estimates derived from them, are expected to be proportional to absolute abundances, no trivial or direct transformation exists to convert mass spectrometry data into absolute pro­ tein abundances44,45. This detail in particular makes across­gene comparisons difficult given how each gene’s protein product is burdened by the individual sets of bias associated with the respective peptides. Indeed, one study pinpointed that absolute protein copy num­ ber estimates, arising from the use of different proteases on the same sample, can be quite discordant (R2 of 0.45 between chymotrypsin and trypsin), presumably due to the differences in the nature of the peptides each protease produces from the same source proteins46. Nonetheless, the use of protein standard mixes and simple or more complex linear transformations (that is, using a subset of proteins of known abundance to model the intensity–abundance relationship and extrap­ olating it across the whole proteome) have been used for reasonable intensity­based absolute copy number estimates47–49. Strategies controlling for the theoretical maximum number of peptides that can arise from a pro­ tein — such as intensity­based absolute quantification (iBAQ), which is analogous to normalizing read counts by gene length in RNA­seq — demonstrably improve estimates50.
Precisely quantifying the impact of bias is difficult as this requires ground­truth data, which are typically lacking. One possibility is to add known amounts of ref­ erence RNAs or proteins41,47. Alternatively, orthogonal quantification methods (such as NanoString mRNA quan tification for mRNAs and isotopically labelled spike­ins for proteins) can be used for validation28,39,47,51,52. Protein digestion biases can also be investigated by com­ paring digestions with different proteases46,49. For exam­ ple, Jovanovic et al. performed separate digestions with different proteases so as to account for systemic errors arising from the use of any one individual enzyme. However, as such alternative validation methods suffer from their own biases, they do not provide ground­ truth data. A pragmatic solution is to report observed RNA–protein correlations as they are, while keeping in mind that they are conservative estimates of the true correlation.
Quantitative parameters of gene expression Before we discuss the biological reasons for high (or low) mRNA–protein correlations, it is instructive to consider the general features of the gene expression pathway so as to underline its impact on these relation­ ships. Specifically, the aforementioned across­gene and within­gene correlations reported to date are presumed to be imperfect in part due to the molecular mecha­ nisms and regulatory steps associated with gene expres­ sion. We use the term ‘gene expression pathway’ rather than ‘central dogma of molecular biology’, which orig­ inally referred to the concept that genetic information cannot be recast into nucleic acids once encoded into proteins53,54.
Hierarchy. The gene expression pathway is hierarchical: protein synthesis requires the presence of an mRNA mol­ ecule, which in turn requires a DNA template (Fig. 3a). In other words, the degree to which one level of the pathway can be regulated depends on the activity of the previous levels. Hence, a transcriptionally silent gene (no mRNAs present) cannot be upregulated by making its mRNA or protein more stable or by increasing its translation. This hierarchy explains the importance of transcriptional con­ trol during metazoan development: cellular differentiation involves the production of tissue­specific proteins, which are often encoded by genes that need to be activated. Transcription factors binding to specific cis­regulatory elements play a key role in the gene regulatory networks controlling animal development55. With the exception of early development (see later), this helps to explain why differentiation is often characterized by coordinated mRNA­level and protein­level changes56,57. Similarly, perturbations that activate transcription of silent genes that are required in the new conditions largely induce changes of the corresponding protein levels via transcrip­ tional mechanisms. For example, a strong proinflamma­ tory stimulus, such as lipopolysaccharide (LPS) treatment, induces upregulation of immune­response proteins in dendritic cells predominantly via transcription49.
Dynamic range. Gene expression involves a remark­ able amplification of signal (Fig. 3b). Beginning with two alleles (two to four gene copies, depending on the cell cycle stage), a typical gene will populate a mamma­ lian cell with one to a few thousand transcripts. Several expressed genes may be present at only less than one mRNA copy per cell (on average). By contrast, individual cells contain between one and 108 protein molecules per gene. Thus, the dynamic range of cellular protein abundances exceeds that of mRNAs by several orders
Box 2 | Quantifying mRNA–protein correlation
the relationship between mrNA and protein levels is commonly assessed using linear correlations of log-transformed mrNA and protein measurements. log transformation is typically used to approximate a normal distribution in the correlation analyses and to stabilize variance. As proteins are synthesized from mrNA templates, mrNA data are used as the independent variable (x axis), while protein data are used as the dependent variable. Such correlation analyses yield the Pearson correlation coefficient (r), the Spearman rank correlation coefficient (ρ; which is resistant to outlying data points) or the coefficient of determination (R2; which is the squared value of r when no intercept is set). R2 is a particularly informative measure as it indicates how much of the observed variability in protein levels can be explained by the variability in mrNA levels. Although correlation coefficients quantify the effect size of a relationship, its significance (often reported via P values) is also important. For a typical across-gene mrNA–protein correlation analysis with thousands of data points, even correlations with low R2 values can be highly significant. Importantly, neither correlation coefficients nor P values fully capture all relevant features of a dataset, which is why plotting the data remains an important part of any correlation analysis176.
Proteases Enzymes that digest proteins into smaller fragments. in shotgun proteomics, proteins are digested into peptides by sequence-specific proteases (such as trypsin, which cleaves proteins at the carboxy-terminal side of lysine and arginine residues).
Lipopolysaccharide (LPS). A structural component of the outer membrane of gram-negative bacteria. LPS may be sensed by specialized cells of the mammalian immune system (for example, dendritic cells), triggering both transcriptional and post-transcriptional responses so as to combat an imminent infection.
NAture revIeWS | GeNeTiCs
vOlume 21 | OctOber 2020 | 633
of magnitude. This is mainly due to the (on average) higher translation efficiencies of mRNAs encoding more abundant proteins47,58.
The higher dynamic range of proteins leaves a lot of room for translational and post­translational regulation. It appears that biology takes advantage of this and adjusts translation and protein stability in a context­dependent manner. For example, the aforementioned LPS stimula­ tion study49 estimated both absolute mRNA and protein copy numbers, their changes over time and translation and protein degradation rates using dynamic stable iso­ tope labelling by amino acids in cell culture (SILAC) and
ribosome profiling (see Box 1) and derived quantitative models in an attempt to attribute the contribution of different processes to establishing protein abundances. Importantly, it was found that the level of ‘housekeeping proteins’ significantly changed after LPS stimulation and that these changes are mainly due to altered translation and degradation49. As housekeeping proteins are typically abundant, changes in their amounts should account for the largest shifts in the absolute make­up of a proteome. Thus, the study authors surmise that post­transcriptional regulation contributes substantially more to abso­ lute protein­level changes than mRNA­level changes.
Across-gene correlation
i
ii
iii
iv
a b
(FPKM)
Fig. 2 | Across-gene and within-gene correlations between mRNA and protein levels. a | Example of an across-gene correlation analysis comparing estimates of absolute mRNA abundance (expressed in fragments per kilobase of transcript per million mapped reads (FPKM)) to protein abundance (expressed as intensity-based absolute quantification (iBAQ)) in brain tissue from REF.39. b | Sample of across-gene correlations across time. Each column of points is taken from a single study (see Supplementary Table 1). Column i demonstrates Pearson correlation coefficients taken from early mRNA–protein correlation studies limited to a subset of the yeast proteome27. Initial attempts in the late 1990s to correlate mRNA and protein abundances were limited in throughput and noted distinct biases based on proteome coverage. Column ii shows that dividing (upper point) yeast cells display higher mRNA–protein correlation than quiescent cells91. Column iii displays correlation coefficients for selected proteins quantified via targeted proteomics across a panel of
tissues28. Column iv displays mRNA–protein correlations during Drosophila melanogaster development, with a higher Spearman correlation coefficient existing between the mRNA levels at 12 hours and the protein levels at 16 hours compared with the matched 14-hour time point (upper and lower triangles, respectively)57. c | Example of a within-gene correlation analysis comparing absolute estimates of mRNA and protein abundance across 29 tissues from REF.39 for two genes: the genes encoding 60S ribosomal protein L12 (RPL12) and sorbitol dehydrogenase (SORD). d | Overview of the analysis in part c across all genes. Genes are ranked by increasing Pearson correlation coefficient in the upper panel. Two example functional gene sets are shown relating to the ribosome (red) or fructose metabolism (green) in the lower panel, with coloured vertical lines corresponding to member gene ranks in the within-gene correlation analysis. Part a is adapted from REF.39, CC BY 4.0 (https://creativecommons.org/licenses/ by/4.0/).
www.nature.com/nrg
634 | OctOber 2020 | vOlume 21
Another example comes from Cheng et al.59, who mea­ sured the impact of protein folding stress on mRNAs and proteins over time. They treated cells with the reducing agent dithiothreitol to trigger protein folding stress. This engendered transient, dissipating mRNA pulses.
These pulses of transcription led to much longer lasting and comparatively larger changes at the protein level. The emerging picture is that inducing novel cellular functions often requires the activation of previously silent genes and is driven through transcriptional changes. By con­ trast, post­transcriptional regulation adjusts the levels of pre­existing proteins to new cellular states23.
Speed. In mammals, the speed at which RNA polymer­ ase II can transcribe primary mRNAs from the DNA template is limited to about ~100 mRNAs per hour, and ribosomes produce less than ~10,000 protein molecules per mRNA per hour60,61. Hence, it takes more than an hour to generate 106 protein molecules after initiation of transcription from a single locus (even without account­ ing for delays due to mRNA processing, transport or other steps). A faster means to upregulate proteins is to increase translation of existing mRNAs62. Indeed, sev­ eral transcripts are translationally repressed to allow their fast translation on demand47,63, a mechanism that is particularly relevant during developmental transitions10. This could explain why about 1% of genes show high transcription rates and low translation rates — a combi­ nation that is energetically unfavourable61. Another pos­ sibility for quickly increasing protein levels is to stabilize a constitutively produced protein that is unstable under ‘baseline’ conditions, such p53 stabilization following DNA damage or hypoxia­inducible factor 1α (HIF1α) stabilization during hypoxia12. A common mechanism of stabilization is by inactivating or downregulating the protein’s E3 ligase. More generally, genes whose mRNAs are translationally repressed or whose proteins are con­ stantly made and rapidly degraded can be considered to be in a poised state that enables rapid protein upregula­ tion. In this case, the benefits to being able to respond quickly appear to outweigh the more energetically frugal but slower activation of genes starting from transcription following a stimulus.
Speed can be equally important for protein downreg­ ulation. As most proteins have half-lives of several hours, protein levels typically decrease rather slowly after trans­ lation or transcription has been switched off. It there­ fore makes sense that proteins encoded by dynamically regulated genes, such as transcription factors (on which cellular states are highly dependent), tend to have short half­lives, expediting their clearance in the event of a state transition47,64. Particularly rapid state transitions may additionally involve the selective destabilization of key regulatory proteins12. It is important to keep in mind that a gene’s protein and mRNA turnover param­ eters (and by extension, their abundances) have suppos­ edly been selected for so as to optimize reaction times to stimuli while avoiding energetic wastefulness and that this trend is likely to affect their correlation.
Can we predict protein levels from mRNAs levels? Protein abundance depends on four factors: transcrip­ tion rates, mRNA half­lives, translation rate constants and protein half­lives47,49,61,65,66 (Fig. 3c). Transcription rates (vsr) and mRNA half­lives (tr1/2) jointly define mRNA lev­ els. Hence, protein levels depend only on mRNA levels, translation rate constants (ksp) and protein half­lives
FOXP1 RPS29
DMD pCMV-GFP
2 alleles
100 101 102 103 104 105 106 107 10810−2 10−1
[mRNA]
[protein]
b
c
a
KLF1
FOXP4
p53
EPHA7
v sr
t r1/2
k sp
t p1/2
t r1/2
T CC
k sp
ln 2 +
ln 2
t p1/2
T CC
Fig. 3 | Quantitative parameters of the gene expression pathway. a | Central processes involved in gene expression from mRNA transcription to protein production and degradation. Key parameters are detailed in the legend to the right. b | Overview of the dynamic ranges of the core parameters of the gene expression pathways in mammals. Examples are taken from a sample of studies referenced in this Review detailed in Supplementary Table 2. Gene names in upper case letters indicate measurements are derived from human samples, whereas gene names in sentence-case letters indicate murine samples. mRNA and protein copy numbers per cell are expressed as averages, implying that some protein species may occur, on average, at less than one copy per cell. ‘pCMV’ denotes the transcriptional rate of a GFP gene flanked by a cytomegalovirus promoter (see REF.60). c | Mathematical expression of mRNA and protein abundances as a function of key gene expression parameters, including cell doubling time, as detailed in REF.66.
Transcription rates The rates at which mRNA transcripts are generated for given genes.
NAture revIeWS | GeNeTiCs
vOlume 21 | OctOber 2020 | 635
(tp1/2). In dividing cells the cell cycle time (Tcc) is also important, especially for long­lived proteins66. In this model, imperfect mRNA–protein correlations arise from translation rate constants and protein half­lives differ­ ing between genes (affecting across­gene correlations) or between conditions (affecting within­gene correla­ tions). If translation efficiencies and protein half­lives were ‘hardwired’ gene­specific parameters (that is, they do not vary across conditions), protein levels could be predicted from mRNA levels via constant gene­specific protein­to­mRNA ratios.
To test this idea, two studies computed median per­gene protein­to­mRNA ratios across human cells and tissues and used them to convert mRNA measure­ ments from a different tissue or cell line into estimates of absolute protein abundance28,67. This approach yielded reasonably good estimates, suggesting that protein levels can indeed be predicted from mRNA levels. However, Fortelny et al.25 found that this prediction also works when protein­to­mRNA ratios are calculated from ran­ domly shuffled or identical mRNA levels. The apparent discrepancy between these observations can be resolved when we consider the huge dynamic range of protein copy numbers across genes (Fig. 3b), which is larger than the dynamic range of both the absolute levels of corresponding transcripts and the within­gene mRNA and protein changes across tissues. For example, the ribosomal protein RPS14 is on average about 400 times more abundant than the transcription factor AHCTF1, whereas the abundance of both proteins and the abun­ dance their mRNAs rarely vary by more than a factor of 10 across tissues (Fig. 4). Consequently, RPS14 has a higher median protein­to­mRNA ratio than AHCTF1, which correctly predicts that ribosomal proteins are more abundant than transcription factors in virtually any tissue. Therefore, median per­gene protein­to­mRNA ratios can be used to predict across­gene protein abun­ dance estimates from mRNA data fairly well (Fig. 4b,c). However, this does not imply that translation efficien­ cies and protein half­lives are ‘hardwired’ gene­specific constants. Instead, per­gene protein­to­mRNA ratios vary substantially across tissues, which indicates tissue­ specific post­transcriptional regulation26,68. This is also highlighted by the aforementioned observation that within­gene studies typically reveal modest correlations between changes in mRNA and protein levels for most genes (Fig. 2d).
In summary, although per­gene protein­to­mRNA ratios provide approximate ‘order­of­magnitude’ esti­ mates for absolute protein abundances across genes, their ability to estimate protein abundance changes for the same gene across samples is limited. We are just beginning to appreciate the role of post­transcriptional, translational and post­translational regulatory events for tissue development and homeostasis69.
Confounders of mRNA–protein dependencies In addition to the condition­specific differences in translation efficiency and protein degradation outlined so far, the relationship between mRNAs and proteins can also be affected by contextual confounders. Many studies deal with a relatively homogenous population of
asynchronously dividing cells, where both proteins and mRNAs find themselves in the same microscopic milieu. Deviations from this situation lead to breakdowns in the dependency of proteins on mRNAs, both within and across genes. Here we expand on a few examples and the mechanisms by which they impact mRNA–protein correlations.
Temporal contexts. Any change in the transcriptional state of a cell will lead to a delay in the response at the protein level simply due to the time it takes to reach a new steady state. Correlations at specific time points during a transition may be uninformative, as changes in mRNA levels in reality correspond to latent changes in protein levels that have yet to occur (Fig. 5a). Given that proteins are on average more stable than mRNAs, pro­ teins can still be present when the mRNA that encoded them is long gone. Accounting for this offset or waiting for the system to reach steady state therefore improves correlations between mRNAs and proteins (both within­gene and across­gene correlations). Relating to the previous section (see Speed), within­gene correla­ tions are likely to be highest among genes whose pro­ tein products respond most quickly to changes in their mRNAs. Indeed, examples of this are seen in Drosophila melanogaster and Xenopus laevis development, where earlier mRNA time points correlate with later pro­ tein time points (see Fig. 2b, column iv)57,70. When temporal disconnects occur, more complex models involving ordinary differential equations must be used to predict protein levels as a function of mRNA levels and time.
Spatial contexts. mRNA–protein correlations can be low if proteins are produced at one specific location and subsequently transported to another one (Fig. 5b). Extracellular compartments (especially in samples involving complex tissues) are one example. For instance, bodily fluids such as blood and urine provide valuable material that can be easily probed for protein biomarkers in the clinic71–75. Here, samples contain lit­ tle mRNA and have proteomes secreted by a variety of different cell types (for example, the secretion of albumin from the liver and insulin from the pancreas into blood)74. Even when the identity of the secreting cells is known, the secretomes of cells cannot always faithfully be predicted from their transcriptomes76. Sometimes, however, informative comparisons of an extracellular proteome and a distant transcriptome of the source cells can be made. Suhre et al.77 described a correlation between high circulating plasma levels of the antigen­processing peptidase ERAP1 and genetic variants associated with the autoimmune disease anky­ losing spondylitis. They further demonstrated higher levels of the source transcript in lymphoblastoid cells containing these variants, establishing a link between genotype, mechanism and the presence of the bio­ marker in patients. Importantly, the spatial structures of even solid tissues, the distribution of cells within them and the various contributions of cell types to the tissue proteome are all expected to confound protein–mRNA relationships.
Half-lives The times it takes for a set of molecules (mRNAs and proteins in the context of this Review) to reduce in number to half of their original quantity via degradation. The term implies that degradation follows first-order kinetics, which is not always true.
Translation rate constants The rates at which proteins are synthesized, as a function of transcript number (expressed as protein copies per mRNA per hour).
Secretomes The complement of the proteome produced and secreted by cells.
www.nature.com/nrg
636 | OctOber 2020 | vOlume 21
Proteins are also transported intracellularly, which is why subcellular proteomes (such as the mitochondrial proteome) are usually not reflected by corresponding subcellular transcriptomes. In highly polarized cells, such as neurons, some proteins are first translated and then transported to their subcellular destination. This can give rise to poor mRNA–protein correlation in tis­ sues such as the mammalian brain, where the distances between cell bodies and axon terminals can be huge78. However, mRNA–protein correlation is not always poor in polarized cells, as protein localization can also be achieved by first transporting source mRNAs to specific subcellular locations, followed by localized translation79. Imaging­based, sequencing­based and
mass spectrometry­based methods are just beginning to shed light on the dynamics of mRNA localization and translation in cells80,81. In summary, it is important to keep in mind that proteins and their mRNAs correlate under the assumption that they co­occur in the same sample, which may not be the case in highly polarized cells and complex tissues that are rich in extracellular material.
Silent nuclei. Several key biological processes occur without any potential regulation at the level of tran­ scription (Fig. 5c). For instance, the nuclei of oocytes and early embryos remain silent before the maternal to zygotic transition. At these time points, a large degree of
AHCTF1
NFKB1
Q )
7
8
9
10
0.5 1.0 1.5 2.0 2.5 3.00.5 1.0 1.5 2.0 2.5 3.0
W ith
in gene
(m RNA)
Acro ss
)
Fig. 4 | Predicting protein levels from mRNA levels. a | Example of protein abundances as a function of mRNA abundance estimates for four genes across 29 tissues taken from REF.39. Central dots display mean abundance estimates, with whiskers denoting a single standard deviation from the mean. b | Example, using median protein-to-mRNA ratios, of predicted protein abundances as a function of mRNA levels. c | Example of observed protein abundances versus predicted protein abundances. d | Overview of within-gene and across-gene coefficients of variation in absolute abundance estimates from REF.39. Within-gene coefficients of variation were calculated on a per-gene basis across 29 tissues (n = 9,869), whereas across-gene coefficients of variation were calculated on a per-tissue basis across 9,869 genes (n = 29). FPKM, fragments per kilobase of transcript per million mapped reads; iBAQ, intensity-based absolute quantification.
Polarized cells A difference in the distribution of cellular materials across a cell (for example, of organelles or proteins).
Maternal to zygotic transition The point at which a developing zygote transitions from relying on maternally imparted proteins and mRNAs to gene products encoded by and transcribed from its own genome.
NAture revIeWS | GeNeTiCs
vOlume 21 | OctOber 2020 | 637
the regulation is at the level of mRNA degradation, pro­ tein degradation and importantly translational acti vation and repression of maternally imparted mRNAs10,82,83. Indeed, many fundamental principles regarding translational regulation (for example, promoting cir­ cularization and polyadenylation or preventing 5 cap complex formation) come into play at these early devel­ opmental stages. Sperm maturation also involves the silencing of the nuclear genome via compaction. Here, one study recently found that WNT signalling, which traditionally is thought to act mostly through transcrip­ tional regulation, acts through post­transcriptional mechanisms to help execute spermatogenesis84. As a fur­ ther example, erythropoiesis involves the entire extrusion of the genome at the reticulocyte stage. Reticulocytes thereafter must still regulate metabolism and degrade their mitochondria and ribosomes in a timed and coor­ dinated fashion as well as simplify their proteome down to nearly only haemoglobin without any immediate
transcriptional oversight85–88. Altogether, these instances demonstrate to what extent the proteome may be dra­ matically altered in situations where transcriptional regulation is absent or curtailed.
Cell proliferation. Each cycle division involves the loss of half of a mother cell’s biomass. Exponentially divid­ ing cells thus have to produce all proteins in proportion to their cellular abundance just to keep their cellular protein levels constant, which requires the presence of corresponding mRNA templates (FigS 3c,5d). By contrast, non­dividing cells need to compensate for protein loss only due to degradation, and several proteins in quiescent cells are stable with half­lives longer than 20 days89,90. With this in mind, the across­gene correlation of mRNAs and proteins is expected to be higher in prolif­ erating cells than in quiescent cells. This has indeed been observed in yeast, but more experiments are needed to address this point in mammalian systems91.
Erythropoiesis The process involving the development and differentiation of red blood cells (erythrocytes).
?
c Silent nucleus
m R
N A
m R
N A
Pr ot
ei n
Pr ot
ei n
Pr ot
ei n
m R
N A
m R
N A
m R
N A
Pr ot
ei n
m R
N A
m R
N A
Pr ot
ei n
Quiescent cell
Pr ot
ei n
m R
N A
Fig. 5 | Contextual confounders of mRNA–protein correlations. a | Example of a state transition occurring due to the induction of a novel transcriptional programme. mRNA levels increase transiently at the second time point, followed by delayed increases in protein levels. b | Example whereby proteins are produced at one position (for example, soma of a cell) but accumulate at a different location (for example, an extension of the cell or in the extracellular space). c | Example of a cell state transition involving the remodelling of the proteome/transcriptome without the induction of novel transcriptional programmes. d | Example of a proliferating cell versus a quiescent cell as well as the corresponding mRNA levels required to maintain the same protein levels in both of them.
www.nature.com/nrg
638 | OctOber 2020 | vOlume 21
Buffering gene expression between levels In contrast to the contextual confounders outlined already, protein­level buffering is emerging as a general principle governing the relationship between mRNAs and proteins. Observations across the biological sciences have highlighted the capacity of life to buffer noise and variation so as to execute biological func­ tions robustly and consistently92. However, steady­state mRNA levels differ substantially between different species and between different individuals of the same species — an observation that can be attributed to environmental factors and germline DNA sequence variation93,94. In addition, transcription occurs in bursts, which results in cell­to­cell variability (that is, noise) of mRNA levels in isogenic cell populations95,96. Here we outline the emerging principles in gene expression control that limit the impact of noise and variation.
Evidence for protein-level buffering. How can organ­ isms maintain stable phenotypes despite this variabil­ ity? In the light of the signal amplification involved in gene expression, it might be expected that noise at the mRNA level would be exacerbated at the protein level. However, although information encoded in mRNA undergoes an amplification in scale as it passes to the protein­level (in terms of the sheer number of mol­ ecules), it does not always undergo an amplification in variation. Instead, within­gene correlation studies (such as those mentioned previously) as well as stud­ ies specifically interfacing mRNA with protein­level variation provided evidence for widespread buffering at the protein level (Fig. 6a). For example, only 33% of genes differentially expressed between primate species display corresponding changes at the protein level36. Similarly, only 35% of mRNA quantitative trait loci (QTLs) in human lymphoblastoid cell lines are asso­ ciated with protein­level changes37. Additionally, only 37% of QTL­associated mRNA changes in outbred mice are reflected at the protein level38. mRNAs encoded by neighbouring genes often covary, presumably due to their similar chromatin context. However, this covar­ iation appears to be non­functional and is lost at the protein level97. DNA copy number alterations pro­ vide an even more extreme example: initial observa­ tions in yeast strains and mammalian cell lines with segmental or whole­chromosome aneuploidy demon­ strated that despite mRNA abundances nearly always scaling with DNA gene copy number, the expected corresponding protein­level changes did not always occur98–100, and this finding was recently extended to clinical tumours101 as well as fibroblasts of patients with Down syndrome102. In general, only about 20–30% of mRNA changes engendered by somatic copy number alterations are reflected at the protein level. This is in contrast to mRNA levels, which typically follow DNA copy number changes103,104. In summary, it appears that buffering mechanisms have evolved that make protein levels somewhat robust against variability in mRNA levels. This makes evolutionary sense given that pro­ tein levels are overall more relevant for phenotypes (see later).
Buffering mechanisms. What are the mechanisms behind buffering against fluctuations in gene expres­ sion? In general, buffering at any level requires some type of feedback mechanism that enables protein pro­ duction or degradation to be upregulated or downreg­ ulated in response to steady­state protein levels. The simplest way to achieve this is via autoregulation (Fig. 6b). Although it is not expected to lead to protein­level buff­ ering, many transcription factors bind and inhibit their own promoters and thus limit the synthesis of their own mRNAs105. More relevant to the dependency of protein
b Autoregulation
mRNA A
Protein B
Protein A
RNA-binding protein
Ubiquitin ligase
N oi
Ub
Fig. 6 | Mechanisms of buffering between mRNA and protein levels. a | Noise/variability at the level of mRNA is sometimes seen to be reduced at the protein level. b | Common autoregulatory mechanisms in gene expression: RNA-binding proteins destabilizing their mRNAs or downregulating their own translation (top), and ubiquitin ligases inducing their own ubiquitylation and degradation (bottom). c | Protein-level buffering via degradation of orphan subunits of multiprotein complexes. Protein A is produced in excess relative to its binding partner protein B. Degradation of the orphan subunit of protein A establishes the correct stoichiometry. Ub, ubiquitin.Quantitative trait loci
(QTLs). Loci in the genome for which the genotype correlates with variation in a quantifiable trait of an organism. Expression QTLs (eQTLs) and protein QTLs (pQTLs) are loci that correlate with variation in mRNA and protein levels, respectively.
Aneuploidy The abnormal copy number of either a segment of or the entire chromosome in the genome.
NAture revIeWS | GeNeTiCs
vOlume 21 | OctOber 2020 | 639
abundances on mRNA abundances, many RNA­binding proteins bind to their own mRNA to destabilize it and/or to inhibit translation106. For example, surplus production of splicing factors favours non­productive autoinhibi­ tory splicing of the source transcripts, which introduces premature stop codons and leads to their destabiliza­ tion and destruction107,108. Finally, many E3 ubiquitin ligases target their own surplus protein molecules for proteasomal degradation109. As autoregulation depends on the ability of proteins to regulate their own abun­ dance, it is limited to special classes of proteins such as DNA­binding and RNA­binding proteins and ubiquitin ligases. More general buffering mechanisms therefore typically require several proteins, with at least one of them being capable of feeding back information to the gene expression pathway110.
In principle, general buffering mechanisms can oper­ ate at any regulatory step of the gene expression pathway (transcription, mRNA stability, translation and protein stability). Although buffering at the mRNA level is com­ monly observed111, it is not expected to be the primary contributor to low covariation between mRNA and pro­ tein levels, as depicted in the aforementioned studies. One relevant phenomenon in eukaryotes is the nuclear retention of transcripts: as nuclear pores have limited capacity for mRNA export, the cytosolic concentrations of mRNAs are to some extent buffered against variability in the nucleus112. Some studies also provided evidence for translational buffering in yeast113,114, but more recent reports have questioned these findings111,115. A general mechanism that could sense protein levels and regulate translation accordingly is hard to imagine. Consistently, ribosome profiling of aneuploid yeast strains recently showed that protein synthesis rates are not generally subject to feedback regulation in cases of chromosome gain115. So how is protein­level buffering achieved? Intriguing insights come again from analyses of aneuploid systems: the proteins that are most resistant to an increase in gene copy numbers are typically members of multipro­ tein complexes99–102. It is also known that unassembled subunits of multiprotein complexes (so­called orphans) are often targeted for degradation116. A study from our laboratory found that more than 10% of proteins in dip­ loid human cells are synthesized in excess, only to be rapidly cleared in the first few hours of their life117. This phenomenon, termed ‘non­exponential degradation’, is prevalent among protein complex subunits, whereby such proteins are immediately unstable directly after synthe­ sis but then acquire higher stability resembling that of their fellow complex members, presumably after complex integration. Indeed, specific E2/E3 ligase proteins, such as UBE2O, have recently been identified that specifically target surplus protein complex member proteins for degradation87,88. Hence, protein complex subunit imbal­ ances can be buffered by degrading overproduced orphan subunits down to the level of the stoichiometrically limiting subunit of a complex (Fig. 6c).
In summary, protein­level buffering is a widespread phenomenon. The emerging picture is that transcrip­ tion, mRNA stability and translation roughly define the amount of cellular protein production. As these processes are subject to noise/variation (across species,
across different individuals of one species and across different cells/tissues within one individual), they inevitably give rise to noise at the level of protein syn­ thesis. This noise is then mainly (but not exclusively) buffered by the selective removal of the protein mol­ ecules made in excess. An instructive example of this phenomenon is the stoichiometry of subunits of protein complexes, which involves both coordinated synthe­ sis (to ensure the production of interacting subunits is within a reasonable range of one another), followed by post­translational fine­tuning via protein degradation. Thus, on the one hand, signal amplification in the gene expression pathway (from transcription to translation) causes for example ribosomes to be much more abun­ dant than proteasomes, while, on the other hand, the noise (that is, deviations from the precise stoichiometry) in the abundance of individual subunits of these com­ plexes is buffered and adjusted by protein degradation. This is how gene expression achieves an amplification in signal (across genes) while preserving the possibility to fine­tune and buffer variability. A more detailed analysis of how the correct stoichiometry of protein complexes is achieved can be found in a recent article118.
Implications for single-cell analysis It is important to keep in mind that the consequences of unbuffered protein­level variation mentioned in the previous section (for instance, protein folding stress) occur at the scale of single cells119,120. Single­cell mRNA sequencing (scRNA­seq) has revolutionized our under­ standing of intercellular mRNA­level variation, although it remains to be seen to what extent this translates into protein­level variability. The upcoming technolo­ gies involved in single­cell proteomics are beyond the scope of this Review and have been extensively covered elsewhere121,122, but briefly, they typically involve one of two main strategies. First is antibody­based protein detection, which is sometimes coupled with use of read­ able nucleic acid barcodes as in cellular indexing of tran­ scriptomes and epitopes by sequencing (CITE­seq)123 or is sometimes coupled with use of heavy­metal isotopes and inductively coupled plasma mass spectrometry124. Second is shotgun proteomic analysis, which is often combined with a ‘carrier channel’ (that is, an abun­ dant internal isotopically labelled standard that helps to increase signal from the low­abundance samples, as used in single­cell proteomics by mass spectrometry (SCoPE­MS))125. In both cases, coverage is limited in comparison with bulk measurements: antibody­based approaches are limited by the diversity and selection of available antibodies, and shotgun methods are restricted to relatively highly abundant proteins.
As discussed earlier, low or biased coverage in pro­ teomics was an original technical factor that had an impact on mRNA–protein correlations in early bulk analyses27 and is expected to play a role in across­gene correlation analyses at the single­cell level. Single­cell analyses provide an attractive opportunity to look at within­gene correlations across populations of cells. Currently available data are inconsistent, with stud­ ies reporting low, intermediate or high within­gene mRNA–protein correlations126–129. It is important to keep
Stoichiometry With reference to protein complexes, the proportion of the individual subunits that make up a protein complex.
www.nature.com/nrg
640 | OctOber 2020 | vOlume 21
in mind that, as with bulk measurements, within­gene mRNA–protein correlations are expected to be different for specific functional groups of genes (Fig. 2d) and that low within­gene single­cell mRNA–protein correlations could also simply be due to a lack of variability in mRNA expression overall. In the future, it will be interesting to see to what extent single­cell measurements follow the same trend as bulk measurements interrogating interspecies36 and interindividual37 protein­level buffer­ ing of mRNA variation. Finally, it should be mentioned that not all remaining protein­level noise is necessarily deleterious: overproduction of many proteins does not appear to be toxic130, and excess proteins can be depos­ ited in aggregates or liquid droplets to effectively reduce noise in free protein concentrations131,132.
The relative value of mRNA and protein levels In the light of all the aforementioned considerations, are mRNA or protein­level measurements preferable? From the technical side, transcriptomics and proteomics are both mature technologies that can provide comprehen­ sive and reliable quantitative data from small sample amounts. Individually, the utility of transcriptomic and proteomic data largely depends on the research context: in the steps of gene expression, mRNAs are closer to the genome and thus wmore directly reflect upstream pro­ cesses such as transcription factor activity, epigenetic regulation and RNA processing events. As cell differ­ entiation involves transcriptional control, methods such as scRNA­seq are powerful tools to map developmental trajectories. Along these lines, a recent proteomic and transcriptomic survey of 375 cell lines from the Cancer Cell Line Encyclopedia (CCLE) underscored that mRNA outperformed proteomics when it came to determining the cell line’s tissue of origin133. Given that mRNA levels scale with DNA copy number, RNA­seq measurements have even been used retrospectively to map DNA copy number alterations in tumours104,134,135 or chromosomal rearrangements triggered by CRISPR editing in cell lines136 to megabase resolution.
In contrast to transcriptomics, proteomics probes a stage of gene expression that is closer to what most people consider ‘gene function’ and is thus more directly related to phenotype. First, protein levels are more robust against functionally irrelevant mRNA­level variability. Second, post­transcriptional and post­translational reg­ ulation induces functionally important changes in pro­ tein abundances, which cannot be seen at the mRNA level. For instance, the cancer cell line study mentioned above outlined how protein complex membership (as discussed in the previous section) is one of the primary features determining protein­level changes133. Indeed, given how integral complex membership is to determin­ ing protein abundances, it would be interesting to see if future attempts at predicting protein abundances from mRNA levels can leverage this ontological information to improve predictions. The higher ‘downstream’ func­ tional relevance of proteomic data is also highlighted by the observation that it outperforms transcriptomic data for gene function prediction137–142 and by the fact that most clinical tests are based on proteins74. Additionally, in a more practical context, proteins are also more stable
than mRNA ex vivo, which is why proteomic analyses of clinical and archaeological samples are often possible when DNA and RNA are degraded143,144. That having been said, protein levels are in many cases not the ideal readout for phenotypes: cell signalling is better reflected by phosphoproteomics30,145, protein function is more directly related to protein–protein interactions146,147 and metabolic perturbations are more directly queried with use of metabolomics148,149.
Conclusions and future perspectives Measuring mRNA and protein levels is an integral part of our effort to understand how the genome impacts phe­ notype. The hierarchy of the gene expression pathway forms the basis of the largely correct assumption that protein abundance should scale with mRNA abundance, both across and within genes. That having been said, the dynamic ranges of its key parameters, contextual con­ founders and protein­level buffering limits our ability to extrapolate conclusions about one level of gene expres­ sion from the other. Therefore, mRNA levels should not be interpreted as the final output of gene expression. Instead, it is more instructive to think of mRNAs as what they mechanistically are: the templates for protein syn­ thesis. The presence of mRNAs is required for protein synthesis. However, due to differential translation, pro­ tein degradation, contextual confounders and pervasive protein­level buffering, this does not imply that proteins are actually being made or are present in proportional quantities. Conversely, the presence of a protein merely implies that a corresponding mRNA was present at the time and place the protein was made. Hence, mRNA levels evolved not as an end in themselves but rather to fulfil the potential protein synthesis needs of a biological system in its near future.
The principles of gene expression control are just beginning to emerge. As we have outlined here, both mRNA­level and protein­level measurements provide unique insights into biological systems. Importantly, however, integrating transcriptomic and proteomic data provides additional information about the principles of gene expression control that cannot be obtained from either type of data alone. We therefore argue that inte­ grated transcriptomic and proteomic analyses should become more routine through either correlative analyses or more complex mathematical modelling approaches. In addition, directly quantifying mRNA and protein syn­ thesis and degradation provides a more detailed picture of gene expression dynamics (Box 1).
The role of post­transcriptional deregulation in dis­ ease is an emerging field of study that we are only begin­ ning to comprehensively explore9,150. Targeting protein degradation, for instance, has arisen as a promising strategy in drug development151–153. Additionally, protein buffering due to aneuploidy is being explored as a poten­ tial therapeutic opportunity in cancer154,155. Acquiring and integrating multi­omics datasets holds the promise to under stand the flow of information from genomes to phenotypes, how this may change in disease and whether it can be leveraged for therapeutic treatment156,157.
Published online 24 July 2020
NAture revIeWS | GeNeTiCs
vOlume 21 | OctOber 2020 | 641
1. Abbott, S. & Fairbanks, D. J. Experiments on plant hybrids by Gregor Mendel. Genetics 204, 407–422 (2016).
2. Lester, G. & Bonner, D. M. The occurrence of beta- galactosidase in Escherichia coli. J. Bacteriol. 63, 759–769 (1952).
3. Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).
4. Gann, A. Jacob and Monod: from operons to EvoDevo. Curr. Biol. 20, R718–R723 (2010).
5. Montgomery, S. B. & Dermitzakis, E. T. From expression QTLs to personalized transcriptomics. Nat. Rev. Genet. 12, 277–282 (2011).
6. Koch, L. Genomics: adding another dimension to gene regulation. Nat. Rev. Genet. 16, 563 (2015).
7. Bentley, D. L. Coupling mRNA processing with transcription in time and space. Nat. Rev. Genet. 15, 163–175 (2014).
8. Hentze, M. W., Castello, A., Schwarzl, T. & Preiss, T. A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol. 19, 327–341 (2018).
9. Tahmasebi, S., Khoutorsky, A., Mathews, M. B. & Sonenberg, N. Translation deregulation in human disease. Nat. Rev. Mol. Cell Biol. 19, 791–807 (2018).
10. Teixeira, F. K. & Lehmann, R. Translational control during developmental transitions. Cold Spring Harb. Perspect. Biol. 11, a032987 (2019).
11. Emmott, E., Jovanovic, M. & Slavov, N. Ribosome stoichiometry: from form to function. Trends Biochem. Sci. 44, 95–109 (2019).
12. Schwartz, A. L. & Ciechanover, A. Targeting proteins for destruction by the ubiquitin system: implications for human pathobiology. Annu. Rev. Pharmacol. Toxicol. 49, 73–96 (2009).
13. Pohl, C. & Dikic, I. Cellular quality control by the ubiquitin-proteasome system and autophagy. Science 366, 818–822 (2019).
14. Ryan, C. J. et al. High-resolution network biology: connecting sequence with function. Nat. Rev. Genet. 14, 865–879 (2013).
15. Mann, M. & Jensen, O. N. Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261 (2003).
16. Alberts, B. et al. Molecular Biology of the Cell (Garland Press, 2002).
17. Aebersold, R. et al. How many human proteoforms are there? Nat. Chem. Biol. 14, 206–214 (2018).
18. Salovska, B. et al. Isoform-resolved correlation analysis between mRNA abundance regulation and protein level degradation. Mol. Syst. Biol. 16, e9170 (2020).
19. Kempfer, R. & Pombo, A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 21, 207–226 (2019).
20. Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
21. Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019).
22. Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016). This review provides an overview of mass spectrometry-based proteomic technologies and their biomedical applications.
23. Vogel, C. & Marcotte, E. M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat. Rev. Genet. 13, 227–232 (2012).
24. Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
25. Fortelny, N., Overall, C. M., Pavlidis, P. & Freue, G. V. C. Can we predict protein from mRNA levels? Nature 547, E19–E20 (2017). This study questions the utility of protein-to-mRNA ratios and argues that these are likely to be of little use when one is attempting to make within-gene estimates of protein levels from mRNA.
26. Franks, A., Airoldi, E. & Slavov, N. Post-transcriptional regulation across human tissues. PLoS Comput. Biol. 13, e1005535 (2017).
27. Gygi, S. P., Rochon, Y., Franza, B. R. & Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 (1999). One of the original studies attempting to correlate proteins with mRNA abundance. Gygi and colleagues note that coverage bias may greatly affect across-gene correlations.
28. Edfors, F. et al. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol. Syst. Biol. 12, 883 (2016).
29. Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 (2014).
30. Mertins, P. et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).
31. Zhang, H. et al. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166, 755–765 (2016).
32. Huang, K.-L. et al. Proteogenomic integration reveals therapeutic targets in breast cancer xenografts. Nat. Commun. 8, 14864 (2017).
33. Archer, T. C. et al. Proteomics, post-translational modifications, and integrative analyses reveal molecular heterogeneity within medulloblastoma subgroups. Cancer Cell 34, 396–410.e8 (2018).
34. Mun, D.-G. et al. Proteogenomic characterization of human early-onset gastric cancer. Cancer Cell 35, 111–124.e10 (2019).
35. Vasaikar, S. et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell 177, 1035–1049.e19 (2019).
36. Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).
37. Battle, A. et al. Genomic variation. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015). This study looks at how mRNA variation, arising from germline DNA variation in a population of humans, is buffered at the translational and protein levels.
38. Chick, J. M. et al. Defining the consequences of genetic variation on a proteome-wide scale. Nature 534, 500–505 (2016).
39. Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).
40. Ankney, J. A., Astor Ankney, J., Muneer, A. & Chen, X. Relative and absolute quantitation in mass spectrometry–based proteomics. Annu. Rev. Anal. Chem. 11, 49–77 (2018). This review addresses the pros and cons of different absolute and relative proteomic quantification methods.
41. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
42. Fu, Y., Wu, P.-H., Beane, T., Zamore, P. D. & Weng, Z. Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. BMC Genomics 19, 531 (2018).
43. Benjamini, Y. & Speed, T. P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).
44. Tang, H. et al. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22, e481–e488 (2006).
45. Zimmer, D., Schneider, K., Sommer, F., Schroda, M. & Mühlhaus, T. Artificial intelligence understands peptide observability and assists with absolute protein quantification. Front. Plant. Sci. 9, 1559 (2018).
46. Peng, M. et al. Protease bias in absolute protein quantitation. Nat. Methods 9, 524–525 (2012). This study specifically interrogates intensity-based quantification methods in mass-spectrometry- based proteomics and how these may differ widely depending on the mode of enzyme digestion.
47. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011). This study combines metabolic pulse labelling and absolute quantification of both mRNAs and proteins with mathematical modelling to quantify the major stages of mammalian gene expression control.
48. Li, J. J., Bickel, P. J. & Biggin, M. D. System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ 2, e270 (2014).
49. Jovanovic, M. et al. Dynamic profiling of the protein life cycle in response to pathogens. Science 347, 1259038 (2015). This study integrates absolute quantification of mRNAs and proteins along with protein turnover information in the context of LPS stimulation using ordinary differential equations to comprehensively assess gene expression regulation.
50. Ahrné, E., Molzahn, L., Glatter, T. & Schmidt, A. Critical assessment of proteome-wide label-free absolute abundance estimation strategies. Proteomics 13, 2567–2578 (2013).
51. Zeiler, M., Straube, W. L., Lundberg, E., Uhlen, M. & Mann, M. A protein epitope signature tag (PrEST) library allows SILAC-based absolute quantification and multiplexed determination of protein copy numbers in cell lines. Mol. Cell. Proteom. 11, O111.009613 (2012).
52. Lin, C. Y. et al. Transcriptional amplification in tumor cells with elevated c-Myc. Cell 151, 56–67 (2012).
53. Crick, F. Central dogma of molecular biology. Nature 227, 561–563 (1970). This study provides the original postulate of the central dogma of molecular biology, not to be confused with gene expression in general.
54. Cobb, M. 60 years ago, Francis Crick changed the logic of biology. PLoS Biol. 15, e2003243 (2017).
55. Davidson, E. H. Emerging properties of animal gene regulatory networks. Nature 468, 911–920 (2010).
56. Lindeboom, R. G. H. et al. Integrative multi-omics analysis of intestinal organoid differentiation. Mol. Syst. Biol. 14, e8227 (2018).
57. Becker, K. et al. Quantifying post-transcriptional regulation in the development of Drosophila melanogaster. Nat. Commun. 9, 4970 (2018).
58. Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).
59. Cheng, Z. et al. Differential dynamics of the mammalian mRNA and protein expression response to misfolding stress. Mol. Syst. Biol. 12, 855 (2016).
60. Darzacq, X. et al. In vivo dynamics of RNA polymerase II transcription. Nat. Struct. Mol. Biol. 14, 796–806 (2007).
61. Hausser, J., Mayo, A., Keren, L. & Alon, U. Central dogma rates and the trade-off between precision and economy in gene expression. Nat. Commun. 10, 68 (2019). This study analyses the parametric landscape of gene expression across genes, investigating the overall strategy evolution has selected, for example, to regulate highly expressed genes.
62. Schwanhäusser, B., Wolf, J., Selbach, M. & Busse, D. Synthesis and degradation jointly determine the responsiveness of the cellular proteome. Bioessays 35, 597–601 (2013).
63. Beyer, A., Hollunder, J., Nasheuer, H.-P. & Wilhelm, T. Post-transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol. Cell. Proteomics 3, 1083–1092 (2004).
64. Zecha, J. et al. Peptide level turnover measurements enable the study of proteoform dynamics. Mol. Cell. Proteomics 17, 974–992 (2018).
65. Kristensen, A. R., Gsponer, J. & Foster, L. J. Protein synthesis rate is the predominant regulator of protein expression during differentiation. Mol. Syst. Biol. 9, 689 (2013).
66. Baum, K., Schuchhardt, J., Wolf, J. & Busse, D. Of gene expression and cell division time: a mathematical framework for advanced differential gene expression and data analysis. Cell Syst. 9, 569–579 (2019). This study formalizes the role of cell cycle time in the context of gene expression.
67. Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014). This study presents an in-depth proteomics and transcriptomics dataset of 12 tissues and is one of the first to posit that protein-to-mRNA ratios can be used to estimate absolute protein abundances from mRNA.
68. Eraslan, B. et al. Quantification and discovery of sequence determinants of protein-per-mRNA amount in 29 human tissues. Mol. Syst. Biol. 15, e8513 (2019).
69. Buszczak, M., Signer, R. A. J. & Morrison, S. J. Cellular differences in protein synthesis regulate tissue homeostasis. Cell 159, 242–251 (2014).
70. Peshkin, L. et al. On the relationship of protein and mRNA dynamics in vertebrate embryonic development. Dev. Cell 35, 383–394 (2015).
71. Xiao, H. et al. Differential proteomic analysis of human saliva using tandem mass tags quantification for gastric cancer detection. Sci. Rep. 6, 22165 (2016).
72. Zhao, M. et al. A comprehensive analysis and annotation of human normal urinary proteome. Sci. Rep. 7, 3024 (2017).
73. Csosz, É. et al. Quantitative body fluid proteomics in medicine - a focus on minimal invasiveness. J. Proteomics 153, 30–43 (2017).
www.nature.com/nrg
642 | OctOber 2020 | vOlume 21
74. Geyer, P. E., Holdt, L. M., Teupser, D. & Mann, M. Revisiting biomarker discovery by plasma proteomics. Mol. Syst. Biol. 13, 942 (2017).
75. Yao, C. et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat. Commun. 9, 3268 (2018).
76. Meissner, F., Scheltema, R. A., Mollenkopf, H.-J. & Mann, M. Direct proteomic quantification of the secretome of activated immune cells. Science 340, 475–478 (2013).
77. Suhre, K. et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 8, 14357 (2017).
78. Moritz, C. P., Mühlhaus, T., Tenzer, S., Schulenborg, T. & Friauf, E. Poor transcript-protein correlation in the brain: negatively correlating gene products reveal neuronal polarity as a potential cause. J. Neurochem. 149, 582–604 (2019).
79. Zappulo, A. et al. RNA localization is a key determinant of neurite-enriched proteome. Nat. Commun. 8, 583 (2017).
80. Holt, C. E., Martin, K. C. & Schuman, E. M. Local translation in neurons: visualization and function. Nat. Struct. Mol. Biol. 26, 557–566 (2019).
81. Chekulaeva, M. & Landthaler, M. Eyes on translation. Mol. Cell 63, 918–925 (2016).
82. Sysoev, V. O. et al. Global changes of the RNA-bound proteome during the maternal-to-zygotic transition in Drosophila. Nat. Commun. 7, 12128 (2016).
83. Stoeckius, M. et al. Global characterization of the oocyte-to-embryo transition in Caenorhabditis elegans uncovers a novel mRNA clearance mechanism. EMBO J. 33, 1751–1766 (2014).
84. Koch, S., Acebron, S. P., Herbst, J., Hatiboglu, G. & Niehrs, C. Post-transcriptional Wnt signaling governs epididymal sperm maturation. Cell 163, 1225–1236 (2015).
85. Liu, X. et al. Regulation of mitochondrial biogenesis in erythropoiesis by mTORC1-mediated protein translation. Nat. Cell Biol. 19, 626–638 (2017).
86. Gautier, E.-F. et al. Comprehensive proteomic analysis of human erythropoiesis. Cell Rep. 16, 1470–1484 (2016).
87. Nguyen, A. T. et al. UBE2O remodels the proteome during terminal erythroid differentiation. Science 357, eaan0218 (2017).
88. Yanagitani, K., Juszkiewicz, S. & Hegde, R. S. UBE2O is a quality control factor for orphans of multiprotein complexes. Science 357, 472–475 (2017). Nguyen et al. (2017) and Yanagitani et al. (2017) characterize the mechanism of action and role of UBE2O as a ubiquitin ligase responsible for clearing surplus protein complex subunits.
89. Mathieson, T. et al. Systematic analysis of protein turnover in primary cells. Nat. Commun. 9, 689 (2018).
90. Dörrbaum, A. R., Kochen, L., Langer, J. D. & Schuman, E. M. Local and global influences on protein turnover in neurons and glia. eLife 7, e34202 (2018).
91. Marguerat, S. et al. Quantitative analysis of fission yeast transcriptomes and proteomes in proliferating and quiescent cells. Cell 151, 671–683 (2012).
92. Félix, M.-A. & Barkoulas, M. Pervasive robustness in biological systems. Nat. Rev. Genet. 16, 483–496 (2015).
93. Rogers, J. & Gibbs, R. A. Comparative primate genomics: emerging patterns of genome content and dynamics. Nat. Rev. Genet. 15, 347–359 (2014).
94. Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).
95. Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 4, e309 (2006).
96. Raser, J. M. & O’Shea, E. K. Control of stochasticity in eukaryotic gene expression. Science 304, 1811–1814 (2004).
97. Kustatscher, G., Grabowski, P. & Rappsilber, J. Pervasive coexpression of spatially proximal genes is buffered at the protein level. Mol. Syst. Biol. 13, 937 (2017). This study outlines how covariation in mRNA owing to chromosomal location of source genes is lost at the protein level.
98. Geiger, T., Cox, J. & Mann, M. Proteomic changes resulting from gene copy number variations in cancer cells. PLoS Genet. 6, e1001090 (2010). This is the first systematic analysis showing that the levels of some proteins are resistant to DNA copy number changes in mammalian cell lines.
99. Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol. Syst. Biol. 8, 608 (2012).
100. Dephoure, N. et al. Quantitative proteomic analysis reveals posttranslational responses to aneuploidy in yeast. eLife 3, e03023 (2014).
101. Gonçalves, E. et al. Widespread post-transcriptional attenuation of genomic copy-number variation in cancer. Cell Syst. 5, 386–398.e4 (2017). This study presents a reanalysis of Clinical Proteomic Tumor Analysis Consortium (CPTAC) breast, ovarian and colorectal cancer studies and finds that ~20–30% of mRNA changes caused by aneuploidy are buffered at the protein level and further leverage this information to predict protein–protein interactions.
102. Liu, Y. et al. Systematic proteome and proteostasis profiling in human Trisomy 21 fibroblast cells. Nat. Commun. 8, 1212 (2017).
103. Schlattl, A., Anders, S., Waszak, S. M., Huber, W. & Korbel, J. O. Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 21, 2004–2013 (2011).
104. Fehrmann, R. S. N. et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet. 47, 115–125 (2015). This study demonstrates, across tens of thousands of microarray and RNA-seq samples, the direct and dosage-sensitive effects of somatic copy number alterations (aneuploidy) on mRNA and underlines almost no buffering between the DNA and mRNA levels.
105. Grönlund, A., Lötstedt, P. & Elf, J. Transcription factor binding kinetics constrain noise suppression via negative feedback. Nat. Commun. 4, 1864 (2013).
106. Müller-McNicoll, M., Rossbach, O., Hui, J. & Medenbach, J. Auto-regulatory feedback by RNA- binding proteins. J. Mol. Cell Biol. 11, 930–939 (2019).
107. Jumaa, H. & Nielsen, P. J. The splicing factor SRp20 modifies splicing of its own mRNA and ASF/SF2 antagonizes this regulation. EMBO J. 16, 5077–5085 (1997). This is one of the early studies analysing the autoregulatory capability of many splicing factors.
108. Lareau, L. F., Inada, M., Green, R. E., Wengrod, J. C. & Brenner, S. E. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446, 926–929 (2007).
109. de Bie, P. & Ciechanover, A. Ubiquitination of E3 ligases: self-regulation of the ubiquitin system via proteolytic and non-proteolytic mechanisms. Cell Death Differ. 18, 1393–1402 (2011).
110. Signor, S. A. & Nuzhdin, S. V. The evolution of gene expression in cis and trans. Trends Genet. 34, 532–544 (2018).
111. Bader, D. M. et al. Negative feedback buffers effects of regulatory variants. Mol. Syst. Biol. 11, 785 (2015).
112. Battich, N., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015).
113. Artieri, C. G. & Fraser, H. B. Evolution at two levels of gene expression in yeast. Genome Res. 24, 411–421 (2014).
114. McManus, C. J., May, G. E., Spealman, P. & Shteyman, A. Ribosome profiling reveals post- transcriptional buffering of divergent gene expression in yeast. Genome Res. 24, 422–430 (2014).
115. Taggart, J. C. & Li, G.-W. Production of protein-complex components is stoichiometric and lacks general feedback regulation in eukaryotes. Cell Syst. 7, 580–589.e4 (2018).
116. Juszkiewicz, S. & Hegde, R. S. Quality control of orphaned proteins. Mol. Cell 71, 443–457 (2018).
117. McShane, E. et al. Kinetic analysis of protein stability reveals age-dependent degradation. Cell 167, 803–815.e21 (2016).
118. Taggart, J. C., Zauber, H., Selbach, M., Li, G.-W. & McShane, E. Keeping the proportions of protein complex components in check. Cell Syst. 10, 125–132 (2020).
119. Santaguida, S., Vasile, E., White, E. & Amon, A. Aneuploidy-induced cellular stresses limit autophagic degradation. Genes Dev. 29, 2010–2021 (2015).
120. Santaguida, S. et al. Chromosome mis-segregation generates cell-cycle-arrested cells with complex karyotypes that are eliminated by the immune system. Dev. Cell 41, 638–651.e5 (2017).
121. Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).
122. Marx, V. A dream of single-cell proteomics. Nat. Methods 16, 809–812 (2019).
123. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
124. Spitzer, M. H. & Nolan, G. P. Mass cytometry: single cells, many features. Cell 165, 780–791 (2016).
125. Budnik, B., Levy, E., Harmange, G. & Slavov, N. SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. Genome Biol. 19, 161 (2018).
126. Popovic, D., Koch, B., Kueblbeck, M., E