translation: what is it good for?

Translation: what is it good for?

Featuring a review of “Reuveni, S. et.al (2011). Genome-Scale Analysis of

Translation Elongation with a Ribosome Flow Model. PLoS Computational Biology,

7(9), e1002127”.

Bradly Alicea

http://www.msu.edu/~aliceabr/

cellular dynamics

cellular dynamics

Introduction Understanding translation as a complex process is an important

aspect of health and evolution:

Molecular Systems Biology:

* viral adaptation to host -- analyzed proteome to look at codon

usage and amino acid preferences (Vol. 5, 311).

* dependence on GroEL – affects codon usage, supports

protein misfolding (Vol. 6, 340).

Cell:

* mistranslation leads to protein misfolding -- constraint on

coding sequence evolution (Vol. 134, 341-352)

* efficiency of protein translation is evolutionarily conserved

(Vol. 141, 344-354). INSETS: IEEE Spectrum,

March 2011, 38-43

Introduction Translation Measurement techniques:

tRNA distribution (inferential)

Proteomic (inferential)

RNA seperation (empirical)

Ribosome profiling (empirical)

Direct Measures of Translation (other than sequence):

1) tRNA adaptation index (tAI): mean adaptation of codons to tRNA pool.

2) codon adaptation index (CAI): tAI + each codon weighted based on

frequency in set of highly-expressed genes.

3) tRNA pool: total collection of tRNAs that contribute to construction of

peptide chains.

UUCGCUAAUAUCCGCG

tRNA pool UUC

UAA

S L L T I S S A

A E D I V S R E

Peptide Sequences

selection

rRNA/mRNA affinity

Goal of Reuveni et.al Paper

Focus on elongation: an iterative process, each codon are recognized by a

specific tRNA, which add an amino acid to growing peptide chain.

Contributions:

1) physically-plausible computational model solely based on coding sequence.

2) new way to study translational elongation (e.g. capture effect of codon order

on translation rates). Translation is rate-limiting (first-come, first-serve).

3) uncover several central and uncharacterized processes (e.g. stochastic nature of

translation, interaction between ribosomes).

Protein and ribosome footprinting datasets:

* E.coli (bacteria), S. pombe and S. cerevisiae (yeast).

* compare what is flowing through ribosome at any given time with mass-

spectrometry derived protein abundance (produces a correlative relationship that

can be compared with gene expression).

Introduction to Model Components Dynamical (TASEP) model:

* exponentially-distributed

translation time (vi is

nonlinear).

* ribosomes have volume, can

interfere with each other.

Dynamical (RFM) model:

two free parameters:

* initiation rate (λ), transition

rate (λn) – related to co-

adaptation between codons and

tRNA pool.

* number of codons (C) at a

site (ribosome) – related to

ribosome density (pn).

mRNA compete for spots on ribosome. Speed

of translation = greater protein abundance

Subset of all

“docked”mRNA

More abstracted

from biology than

TASEP (fewer

parameters)

Agreement between RFM and TASEP

Comparison of RFM and TASEP

models for a range of transition

rates.

* tight correlation between the two,

with less tight correlation at higher

values.

* transition rate similar to a

mutation rate (μ - stochastic

parameter).

Differential equations that describe

RFM dynamics.

Stack that describes conditions for

parameter values 1, 1 < i < n, and n.

Elongation rate capacity,

comparisons with protein abundance Elongation rate capacity:

* each gene = different translation elongation capacity.

* capacity = maximal translation rate of gene.

Protein abundance (PA) predictor:

* tAI central feature (does incorporate mRNA levels and evolutionary rates – but

not codon order or ribosome jamming).

PA vs. RFM/tAI for bacteria and yeast

PA vs. RFM PA vs. tAI

E.coli R = 0.54, p < 10-16 R = 0.43, p < 10-16

S. pombe R = 0.63, p < 10-16 R = 0.56, p < 10-16

For S. cervisiae, tAI performs better than RFM only for most highly expressed

genes (more robust to permutations in codon order).

RNA molecules: sites C codons in length (e.g. size of footprint).

C = {3, 4,…, 25}

* where Cmin = 3 and best predictor of protein abundance = 25.

RNA molecules arrive at ribosomes with initiation rate λ, and can only bind if

site is not occupied.

* initiation rate (λ) is a function of physical constraints (number of free

ribosomes, folding energies, base pairing between RNA and rRNA).

Probability that ith site occupied at time t, or pi(t), represents ribosome flow (e.g.

translation rate, or R).

* each gene has a different elongation capacity. Maximal translation rate (Rmax)

of a gene occurs for infinitely large initiation rate (λ).

* RNA degradation was taken into account in previous work, but could not

improve the performance of this model.

Steady State Model Function

Translation Rate

Left: genomic profiling of R vs. λ for 10 typical genes (gene expression), red curve

is the mean.

* very small values of λ show little to no activity (artifact).

Right: predicted profile for top 25% (blue) bottom 25% (red) of genes (w.r.t.

expression).

* characteristic level of gene expression is asymptotic to characteristic value for R.

Experiment Prediction

Comparisons to Protein Abundance

Figure 4, A) coarse-graining parameter (C, x-

axis) vs. correlation between protein

abundance and translation rate (y-axis).

Coarse-graining parameter = number of codons

considered at one time (representative of ribosomal

footprint, or maximal ribosomal RNA fragment

size using this approach).

Heterologous gene expression (produce mRNA in one species using gene libraries

from another species): 1) Welch et.al (PLoS One, 4, e7002 -- 2009), genes for Bacillus phage proteins). All genes

encode same AA, each has different codon composition.

* correlation between RFM predictions and protein abundance (p = .0004), only for tiny initiation rates (rate-

limiting in this context).

2) Burgess-Brown et.al (Protein Experimental Purification, 59, 94-102 -- 2008), optimized

codons from 31 human genes, expressed in E.coli.

* in 18/31 cases (58%), protein abundance improved post-optimization. Correlation with fold-change

upregulation higher (0.45) for RFM model than tAI model (0.34).

Speed, Optimality and Variation

Figure 6, A. Mean genomic translation rate

against initiation rate.

* dotted lines = saturation points on x- and y-axes

(~90% of maximal rate, or "working point" that

varies by species) for 0 hours (dark blue

function), 9 hours (light blue function).

Figure 6, B. Variety of human tissues, cell

types (left-hand side = brain regions, right-

hand side = tissues such as kidney, skin,

lung, liver, and heart).

* correlations between known mRNA expression

levels and model predictions. Blue bars = tAI.

Reddish brown = RFM. Inset is the improvement

in correlation with mRNA for RFM model > tAI

model.

Effects of tRNA pool

Specialization versus adaptation: two strategies employed by cyanophages to enhance

their translation efficiencies. Nucleic Acids Research, 2011, 1-13:

* specialization vs. adaptation in viruses w.r.t. translation efficiencies.

Modify translation efficiency during

infection cycle

Enhancement of fitness for virus, commensalism

Bias tRNA pool towards CG content of host,

negligible effect on host

In both cases, virus must use host’s replication

machinery: Specialization: virus sequence evolves to match tRNA pool

management, CG context of a specific taxon.

Adaptation: virus sequence highly evolvable, allows virus

to adapt to a number of taxonomic targets.

Higgs, P.G. and Ran, W. (2008). Molecular Biology

and Evolution, 25(11), 2279-2291: * tRNA genes provide a new force of evolution (bias protein

production).

* coevolutionary dynamics: tRNA gene content bias codons

used in translation towards those most rapidly translated.

Copy number can evolve to optimize codon usage.

Ribosome footprinting

Ingolia et.al (2009). Genome-wide analysis in vivo of translation with nucleotide

resolution using ribosome profiling. 324, 218-223.

Ribosomal profiling: use deep sequencing (sequence data) to uncover protected RNA

fragments. Quantify RNA abundance.

Ribosomal occupancy of RNA: short sequences that allow us to

detect different phases of translation.

* short sequences (footprint) = number of codons on ribosome.

GCN4: highly upregulated in translatome (starvation response),

less so using standard polysome harvest techniques.

Relation between CRL technique

and ribosome footprinting

In both cases, sequencing is possible:

* requires library construction, which is the main distinction

between the two approaches.

* matter of subsampling (RF is a subsample of polysome

method, not necessarily more precise or with better resolution).

RNA that has a looser association (e.g. moieties) with ribosome

(larger fragments, represent effects of post-transcriptional

modifications, transcriptome-like quantification).

RNA that is feeding through ribosome at time t, explicit

association with ribosome (smaller fragments, directly

correlate to protein abundance).

RIBOSOME FOOTPRINTING (RF)

CRL METHOD (e.g. TRAP, buffer-based extraction)

Sequenced derived

from fragments

between subunits

Effects of translation and phenotype Wilson, M.A., Meaux, S., Parker, R., and van Hoof, A. (2005). Genetic interactions

between [PSI] and nonstop mRNA decay affect phenotypic variation. PNAS,

102(29), 10244-10249.

Yeast strains can reversibly interconvert

between [PSI+] and [psi-] states: * [PSI+] = prion form the translation termination factor

eRF3.

* causes read-through at stop codons, can lead to

phenotypic variation.

Nonstop mRNA decay triggered when

ribosome reaches 3' end of transcript: * interaction between [PSI+]-induced phenotypic

variation, defects in nonstop mRNA decay.

* some phenotypic effects of [PSI+] may be due to

read-through of normal stop codons (produces extended

proteins, modulates phenotype).

* periodic sampling of 3' UTR = rapid divergence (for

novel and beneficial protein extensions).

The “big picture”

Foss, E.J. et.al (2011). Genetic variation shapes protein networks mainly through

non-transcriptional mechanisms. PLoS Biology, 9(9), e1001144.

Beyond the transcriptome: what controls protein variation? PLoS Biology, 9(9),

e1001146:

* previous studies in yeast -- demonstrated correlation between protein abundance

and transcript abundance.

* does not imply that this correlation will exist for the same gene across different

individuals.

* only 27% of genes exhibit correlation. Vast majority of highly expressed genes

determine either transcript or protein levels (but not both).

* what are post-transcriptional mechanisms? Gray area/lots of nuance between

transcriptional RNA and peptide sequence/protein structure.

RNA decay and Regulatory Control

What’s going on here: physiological “control”. Based on RNA kinetics (decay,

transcription/translation rate, ½ life). Initial model:

Feedforward scenario Feedback with saturation scenario

Stimulus Stimulus Production at

ribosome

Presence of

mRNA

Presence of

mRNA

Decay rate

(1/d)

Decay rate

(1/d)

If above threshold, (-)

If below threshold, (+)

(+) (+) (+) (+)

Mechanism for differences observed

between TLT, TST within passage,

condition.

Mechanism for differences

observed across TLT, TST or

between passage, condition.

Rein control:

* two sources (TLT, TST) that

are independently regulating

(controlling) a common

process (cellular state).

Production at

ribosome

(+) (+)

INSETS: IEEE Spectrum, March 2011, 38-43

Control Model Based on Decay

TST TLT

TST FF

FB

D

Feedback with saturation model using drug

treatments (stripped-down version of RNA

regulation and decay in cell).

Theoretically:

* Actinomycin D disallows A.

* Mitomycin C disallows A, allows C and D.

* Saporin disallows B, C, and D.

A1

B

C D

TST TLT

TST TLT

Decay off example FB, FF off example

Control strategy: rein control (FB, FF drive state of TST over time)

with brake (saturation, characterized by decay).

A2

Control Model Based on Decay

0

0

1.24

24.06 0

2.811

12.799

0

27.89

3.34

1d

2d

21.25073 FF

FB

D D

31.235 20.0104/

28.4165

26.46 0

5.74 2.313

30.21

0

3d

0d

TST TLT

Transition rules: 1) for t1, difference between input and TST

2) for tn > 1, difference between TSTtn

and TSTtn+1 or TLTtn and TLTtn+1

Example model run using data from COL qPCR in L10A fibroblasts under

Actinomycin D treatment, 3d.

3) if TST > TLT, then x > 0.

4) if Bt-1> Bt, then FB is x > 0.

Conclusions

Translation is a relatively unexplored area:

Translation (mRNA) Transcription (mRNA) Protein (peptide)

Sequence compression: DNA-RNA, RNA-RNA’, RNA-PROTEIN

Post-transcriptional

modifications

Translation (tRNA

conversion)

Translation-associated RNA provides several new pieces

of information about cellular biocomplexity:

1) what goes into protein production?

2) what is the speed of translation? Aggregation rates of

RNA?

3) new computational models for cellular function?

translation: what is it good for?

Documents