wei ren dissertation
TRANSCRIPT
UNIVERSITY OF CALIFORNIA
RIVERSIDE
Rewiring Translation for Photocontrol and Haptens, and Computational Analysis
A Dissertation submitted in partial satisfaction
of the requirements for the degree of
Doctor of Philosophy
in
Chemistry
by
Wei Ren
June 2016
Dissertation Committee:
Dr. Huiwang Ai, Chairperson
Dr. Ashok Mulchandani
Dr. Wenwan Zhong
Copyright by
Wei Ren
2016
The Dissertation of Wei Ren is approved:
Committee Chairperson
University of California, Riverside
iv
ACKNOWLEDGEMENT
This dissertation has used paragraphs, sentences, figures and tables from four published
articles by Wei Ren and Dr. Huiwang Ai. Published articles are listed below:
1. Wei Ren, Huiwang Ai. 2012. Ribosomal incorporation of unnatural amino acids:
learning from mother nature. Nova Publishers.
2. Wei Ren, Ao Ji, Huiwang Ai. 2015. Light activation of protein splicing with a
photocaged fast intein. Journal of American Chemical Society. 137(6), 2155-2158.
3. Wei Ren, Ao Ji, Michael X. Wang, Huiwang Ai. 2015. Expanding the genetic code
for a dinitrophenyl hapten. Chembiochem. 16(14), 2007-2010.
4. Wei Ren, Tan Truong, Huiwang Ai. 2015. Study of the binding energies between
unnatural amino acids and engineered orthogonal tyrosyl-tRNA synthetases.
Science Reports. 5, 12632.
v
To my father, Mr. Qilin Ren;
To my advisor, Dr. Huiwang Ai;
To scientists influenced me (Dr. Alan Turing and Dr. Nicholas Metropolis).
vi
ABSTRACT OF THE DISSERTATION
Rewiring Translation for Photocontrol and Haptens, and Computational Analysis
by
Wei Ren
Doctor of Philosophy, Chemistry
University of California, Riverside, June 2016
Dr. Huiwang Ai, Chairperson
The objective of my Ph.D. study is to expand the unnatural amino acid (unAA) toolbox to
genetically encode additional photocaging functional groups to achieve a precise control
of proteins with light, to site-specifically label proteins with hapten moieties, and to further
explore computational methods with an ultimate goal of using computers to design specific
orthogonal aminoacyl-tRNA synthetases (aaRSes) for given unAAs.
In this thesis, we show that cellular biochemical processes can be spatiotemporally
manipulated by light-activatable protein-splicing inteins. We genetically encoded a
photocaged cysteine and introduced the photocaged cysteine into a highly efficient Nostoc
punctiforme (Npu) DnaE intein, which is capable of excising itself and subsequently
splicing adjacent N- and C-terminal extein flanks to form a new truncated peptide. The
vii
resulting photocaged intein was inserted into a red fluorescent protein (RFP) mCherry and
a human Src tyrosine kinase, and a light-induced photochemical reaction was able to
reactivate the intein and trigger protein splicing. The genetically encoded photocaged intein
is a general optogenetic tool, allowing effective photocontrol of primary structures and
functions of proteins.
Haptens, such as dinitrophenyl (DNP), are small molecules that induce strong immune
responses when attached to proteins or peptides and, as such, have been exploited for
diverse applications. In this thesis, we engineered a Methanosarcina barkeri pyrrolysyl-
tRNA synthetase (mbPylRS) to genetically encode a DNP-containing unAA, N6-(2-(2,4-
dinitrophenyl)acetyl)lysine (DnpK). This technique is a promising strategy for biological
preparation of proteins containing site-specific DNP. This new capability is expected to
find broad applications in biosensing, immunology, and therapeutics.
The experimental procedure to derive orthogonal aaRSes/aminoacyl tRNAs, which
typically involves several rounds of positive and negative selection, is laborious and time-
consuming, and requires considerable expertise. It is often not trivial to derive orthogonal
aaRSes for unAA substrates that are very different from the enzymes’ native substrates. In
this thesis, we compared several computational algorithms to evaluate the binding energies
of unAA and previously developed orthogonal aaRSes. We hope to use these results to
guide future designing and development of new aaRSes, and to extend the capability of the
genetic code expansion technology to many new unAAs.
viii
TABLE OF CONTENTS
SIGNATURE PAGE ......................................................................................................... iii
ACKNOWLEDGEMENT ................................................................................................. iv
DEDICATIONS ...................................................................................................................v
ABSTRACT ....................................................................................................................... vi
TABLE OF CONTENTS ................................................................................................. viii
LIST OF FIGURES .............................................................................................................x
LIST OF SCHEME ........................................................................................................... xii
LIST OF TABLES ........................................................................................................... xiii
Chapter 1: Introduction ........................................................................................................1
1.1 Genetic Encoding Unnatural Amino Acids ................................................................1
1.1.1 Ribosomal Protein Synthesis ...............................................................................1
1.1.2 Incorporation of Unnatural Amino Acids............................................................7
1.1.3 Engineering of Ribosome and Other Related Components ...............................11
1.1.4 Further Directions ..............................................................................................17
References ......................................................................................................................18
Chapter 2: Light Activation of Protein Splicing with a Photocaged Intein .......................23
2.1 Introduction ..............................................................................................................23
2.2 Materials and Methods .............................................................................................25
2.2.1 Materials ............................................................................................................25
2.2.2 Chemical Preparation of Photocaged Cysteines ................................................26
2.2.3 Plasmid Constructions .......................................................................................28
2.2.4 Mammalian Cell Culture and Transfection .......................................................33
2.2.5 Analysis of Intein-Mediated Splicing of mCherry ............................................34
2.2.6 Analysis of Intein-Mediated Splicing of Src .....................................................35
2.2.7 Photoactivation of Src and Fluorescence Microscopic Imaging .......................36
2.2.8 Mass Spectrometry Analysis of Proteins ...........................................................36
2.3 Results ......................................................................................................................36
2.4 Conclusions ..............................................................................................................48
References ......................................................................................................................49
Chapter 3: Expanding the Genetic Code for a Dinitrophenyl Hapten ...............................54
ix
3.1 Introduction ..............................................................................................................54
3.2 Materials and Methods .............................................................................................56
3.2.1 Chemical Synthesis of N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK, 3) .......56
3.2.2 Chemical Synthesis of N6-(2-(2-nitrophenyl)acetyl)lysine (2-NPK) and N6-(2-
(4-nitrophehyl)acetyl)lysine (4-NPK) ........................................................................58
3.2.3 Evolution of a Mutant Aminoacyl-tRNA Synthetase ........................................59
3.2.4 Computational Modeling of the DnpK/DnpKRS Complex Structure ...............60
3.2.5 Protein Expression and Purification from E. coli ..............................................60
3.2.6 Protein Expression and Purification from HEK293T Cells ..............................61
3.2.7 Protein Electrospray Mass Spectrometry ..........................................................62
3.2.8 Western Blotting ...............................................................................................62
3.3 Results ......................................................................................................................62
3.4 Conclusions ..............................................................................................................73
References ......................................................................................................................74
Chapter 4: Study of the Binding Energies between Unnatural Amino Acids and
Engineered Orthogonal Tyrosyl-tRNA Synthetases .........................................................79
4.1 Introduction ..............................................................................................................79
4.2 Methods ....................................................................................................................83
4.2.1 Preparation of aaRS-Amino Acid Complexes ...................................................83
4.2.2 Binding Energy Scoring with Autodock Vina and ROSETTA .........................85
4.2.3 Molecular Dynamics Simulations .....................................................................88
4.2.4 MM/PBSA Building Energy Calculation ..........................................................88
4.3 Results and Discussion .............................................................................................89
4.3.1 Selection and Preparation of aaRS-Amino Acid Complexes ............................89
4.3.2 Binding Energy Scoring with AutoDock Vina and ROSETTA ........................90
4.3.3 Binding Energy Estimation by MD-MM/PBSA or Direct MM/PBSA ............92
4.3.4 Binding Modes of aaRS-unAA Complexes .......................................................97
4.4 Conclusions ............................................................................................................100
References ........................................................................................................................101
Chapter 5: Summary ........................................................................................................109
x
LIST OF FIGURES
Figure 1.1 Chemical structures of pyrrolysine and selenocysteine. .....................................4
Figure 1.2 Biological pathways to synthesize selenocysteyl-tRNASec (Sec-tRNASec);
Schematic representation of the mechanism of encoding selenocysteine in mammalian
cells. .....................................................................................................................................6
Figure 1.3 Schematic diagram of genetic encoding of unnatural amino acids in living
cells. .....................................................................................................................................8
Figure 1.4 The competition between amber (TAG) codon suppression and RF-1 induced
translation termination. ......................................................................................................12
Figure 1.5 Protein synthesis in E. coli using a wild-type ribosome and an engineered
orthogonal ribosome. .........................................................................................................15
Figure 2.1 Plasmid map of pMAH2-CageCys. .................................................................29
Figure 2.2 X-ray crystal structure of mCherry (redrawn from PDB 2H5Q). ...................30
Figure 2.3 X-ray crystal structure of the human Src kinase catalytic domain (redrawn
from PDB 1FMK). .............................................................................................................31
Figure 2.4 Genetic encoding of photocaged cysteines in HEK 293T cells. ......................38
Figure 2.5 Photolysis of photocaged cysteines. .................................................................39
Figure 2.6 ESI mass spectrometry analysis of intact proteins. ..........................................40
Figure 2.7 Photoactivation of mCherry. ............................................................................42
Figure 2.8 Photoactivation of Src kinase. ..........................................................................46
Figure 2.9 Pseudocolored ratio FRET images of representative UVA-treated HEK 293T
cells harboring the F1 construct. ........................................................................................47
Figure 3.1 Applications of DNP-labeled proteins..............................................................55
Figure 3.2 Chemical Structure of N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK). .........64
Figure 3.3 Mass spectrometry analysis of the indicated proteins purified from DH10B or
the nfsA/nfsB double deletion strain, suggesting a reduced DNP group in these proteins.
............................................................................................................................................68
Figure 3.4 Mass spectrometry analysis of the indicated proteins purified from DH10B in
the presence of 2-NPK or 4-NPK. .....................................................................................69
Figure 3.5 Direct ESI-MS analysis (positive mode) of the lysate of DH10B cells
incubated with 1 mM DnpK. .............................................................................................70
Figure 3.6 SDS-PAGE and Western blot of DnpK-containing EGFP and the wild-type
EGFP, purified from HEK 293T cells. ..............................................................................70
Figure 3.7 Fluorescence imaging of HEK 293T cells containing genes for pEGFP-
Tyr39TAG, DnpKRS, and the corresponding suppressor tRNA, in the presence or
absence of DnpK (1 mM).. ................................................................................................71
xi
Figure 4.1 Chemical structures of natural and unnatural amino acids used in this study (1:
p-acetyl-L-phenylalanine, AcF; 2: 3-iodo-L-tyrosine, IoY; 3: p-iodo-L-phenylalanine,
IoF; and 4: L-tyrosine, Tyr). ..............................................................................................84
Figure 4.2 The RMSD values in the MD trajectories of the seven studied aaRS-amino
acid complexes. ..................................................................................................................87
Figure 4.3 The contributions of individual amino acid residues of aaRSes to the total
binding energies. ................................................................................................................96
Figure 4.4 MD-averaged structures showing the active sites of the studied aaRSes and
unAA complexes. ...............................................................................................................99
xii
LIST OF SCHEME
Scheme 2.1. Synthetic route to prepare photocaged cysteine. ...........................................26
Scheme 3.1. Synthetic route to prepare DnpK. ..................................................................56
Scheme 3.2. Synthetic route to prepare 2-NPK and 4-NPK. .............................................58
xiii
LIST OF TABLES
Table 4.1 Estimated binding free energies using AutoDock Vena and ROSETTA for the
seven tested aaRS-amino acid complexes. .........................................................................86
Table 4.2 Calculated binding energies using MD-MM/PBSA or direct MM/PBSA for the
seven aaRS-amino acid complexes. ...................................................................................95
1
Chapter 1: Introduction
1.1 Genetic Encoding of Unnatural Amino Acids
1.1.1 Ribosomal Protein Synthesis
Genetic information is mainly stored in cells as sequences of nucleotides.[1] Each nucleotide
is composed of a pentose (5-carbon carbohydrate), a phosphate group extending from 5’
(or 3’) position of the pentose, and one of four types of nucleobases. In
deoxyribonucleotides (DNA), 2-deoxyribose is the pentose, and adenine (A), guanine (G),
thymine (T) and cytosine (C) are the four types of bases. Prokaryotic and eukaryotic cells
use regions of DNA sequences as the templates to synthesize strands of ribonucleotides
(RNA). Sequences of RNA strands are copied from DNA strands, except that ribose
replaces 2-deoxyribose and uracil (U) replaces thymine as one of the four RNA bases. This
process is termed as “transcription[1]”. Next, Proteins are synthesized from transcribed
messenger RNAs (mRNA): every three bases in an mRNA open reading frame are
“translated” into a single amino acid residue. An important group of enzymes, aminoacyl
transfer RNA (tRNA) synthetases, catalyze the linkage between amino acids and tRNAs.
Every tRNA has a 3-base anticodon in its anticodon loop to pair with mRNA during
ribosomal protein synthesis.
Ribosomes are large RNA and protein containing machineries (up to several million Da),
catalyzing the formation of peptides from individual amino acids.[2] Ribosomes exist in all
archaeal, eubacterial and eukaryotic cells. Although differing in size and in detailed
2
composition, each ribosome has two subunits, a large subunit catalyzing peptidyl transfer
reaction and a small subunit critical for translation initiation.[2] Translation initiation factors
assemble the small subunit and the mRNA to start the formation of a translation complex.
The Shine-Dalgarno (SD) sequence of prokaryotic mRNA and 5’ cap of eukaryotic mRNA
are very important for the initiation.[3, 4] The nearby AUG codon is then identified by the
ribosome and decoded as an N-terminal N-formylmethionine (fMet) in prokaryotes or
methionine (Met) in eukaryotes. After the ribosome is fully assembled at the initiation
AUG site, it contains three RNA-binding sites, designated A, P and E sites. Elongation
starts when the fMet-tRNA (or Met-tRNA in eukaryotes) enters the P site, resulting in a
conformational change which opens the A site for another aminoacyl-tRNA to enter.
Peptide formation is catalyzed by the ribosomal RNA in the large subunit. After the bond
is formed, the A site contains a newly formed peptide, while the P site contains an
uncharged tRNA. The ribosome moves along the mRNA, so the uncharged tRNA enters
the E site and then exits from the ribosome. The peptidyl-tRNA enters the P site and opens
the A site for the next round of coupling. Elongation factors are needed in this process, for
example, to facilitate the entry of aminoacyl tRNA into the A site. When the ribosome
reaches one of the three termination codons (UAA, UAG and UGA), releasing factors
(proteins) would enter the A site and trigger the hydrolysis of the ester bond in peptidyl-
tRNA at the P site.[5] After releasing the peptide, the whole complex is disassembled with
the aid of several protein factors to recycle translation components. More detailed process
about ribosomal protein synthesis can be found in recently published review articles and in
other chapters of this book.[6-8]
3
Under most circumstance, every three consecutive bases following the starting AUG codon
are translated into one amino acid. All four types of nucleobases can make 64 codons. With
three exceptions (UAA, UAG and UGA as stop codons), each codon encodes one of the
20 common natural amino acids. So there are degenerated codons: most of the 20 amino
acids are encoded by more than one codon. The correspondence between codons, and
amino acids and translational termination message, is nearly universal among all domains
of life.[9]
We here discuss a few exceptions. Mitochondrial ribosomes synthesize mitochondrial
proteins based on different codon tables.[10] Mitochondria carry their own genome, which
includes mitochondrial tRNAs. The mitochondrial genetic code has drifted from the
universal code. Furthermore, organisms including bacteria, yeast and other eukaryotes can
harbor suppressor tRNAs that can recognize and decode nonsense codons (UAA, UAG or
UGA).[11] These tRNAs were most likely derived from normal tRNAs through anticodon
mutations. New codon-anticodon interactions are established to read through stop codons.
In most natural cases, one of the 20 common natural amino acids is inserted in response to
stop codons.
It is quite unique to insert the unusual amino acid, pyrrolysine, in response to UAG
codons.[12, 13] Certain methanogenic archaea including Methanosarcina barkeri and M.
mazei, and the gram positive bacterium Desulfitobacterium hafniense, express amber
4
suppressor tRNAs (tRNAPyl) and synthetases that catalyze the charge of tRNAs with
pyrrolysine (Figure 1.1A). They also harbor gene clusters to biochemically synthesize the
amino acid pyrrolysine.[14, 15] The process to insert pyrrolysine is similar to the process for
ribosomal insertion of other amino acids: pyrrolysine-charged tRNAs are brought into
ribosomes by typical elongation factors to extend the nascent peptides.
Figure 1.1 Chemical structures of (A) pyrrolysine and (B) selenocysteine.
Another unusual amino acid, selenocysteine (Figure 1.1B), is also genetically encoded in
many natural organisms.[16, 17] Compared to cysteine, selenocysteine has a lower pKa and
a higher reduction potential, so diselenium bonds are more easily formed.[18]
Selenocysteine has been found to play a critical role for the function of a few anti-oxidant
proteins. Unlike other 20 natural amino acids and pyrrolysine, selenocysteine is not directly
charged to its tRNA (Figure 1.2A), because there is no free selenocysteine in cells.[19]
Instead, seryl-tRNA synthetase first links serine to a special selenocysteine tRNAs
(tRNASec). The resulting Ser-tRNASec is not recognized by translation factors, so are not
used for ribosomal translation. Next, the tRNA-bound seryl residue is converted to a
selenocysteine in the presence of appropriate enzymes and selenium donor molecules.[19]
Alternative translational elongation factors are needed to bring selenocysteine-charged
5
tRNASec (Sec-tRNASec) into ribosome for protein synthesis (Figure 1.2B).[17] The
anticodon of tRNASec is UCA, so it can pair with the UGA opal codon. Not all UGA
codons are suppressed, however. The mRNAs of selenocysteine-containing proteins
(selenoproteins) often contain sequences called SECIS (selenocysteine insertion sequence)
elements. The SECIS elements are defined by characteristic nucleotide sequences,
secondary structures and base-pairing patterns. In bacteria, SECIS elements are typically
located immediately after UGA codons in reading frames. In archaea and eukaryotes,
SECIS elements are in the 3’-UTRs (untranslated regions) of mRNAs, and can direct
multiple selenocysteines into a single peptide in response to multiple UGA codons (Figure
1.2B).[20] Sec-tRNASec specific elongation factors can bind SECIS elements, and promote
the delivery of Sec-tRNASec into ribosomes associated with the same mRNA. When cells
are grown in the presence of selenium, corresponding UGA codons are suppressed to
synthesize full-length functional selenoproteins.
6
Figure 1.2. (A) Biological pathways to synthesize selenocysteyl-tRNASec (Sec-
tRNASec). (B) Schematic representation of the mechanism of encoding
selenocysteine in mammalian cells.
Another related unusual case is ribosomal frameshifting during protein synthesis.[21]
Typically, proteins are synthesized based on a template mRNA with every three
consecutive nucleotides being read as an amino acid. However, frameshifting occurs at low
frequency: the ribosome slips by one base in either the 5’ (-1) or 3’ (+1) directions during
translation. Frameshifting is related to nucleotide sequence, secondary structure and
tertiary structure of an mRNA.
In the past decade, tremendous efforts have been put into investigation of molecular
mechanisms related to ribosomal protein synthesis. Atomic structures of individual
7
components involved in ribosomal protein synthesis have been elucidated. The 2009 Nobel
Prize in Chemistry has been awarded to Venkatraman Ramakrishnan, Thomas A. Steitz
and Ada E. Yonath for solving ribosome structure using X-ray crystallography.[6-8]
1.1.2 Incorporation of Unnatural Amino Acids
The work to understand how proteins are synthesized has been very fruitful. In the
meanwhile, researchers have developed methods to dramatically expand the repertoire of
amino acids used in protein synthesis.[22, 23] Orthogonal tRNAs and aminoacyl synthetases
have been engineered to encode unusual amino acids in response to nonsense codons and
4-base codons. Additional translational machinaries including ribosome and translation
factors have been mutated to increase the synthesis of unnatural proteins.[24, 25] Structurally
and functionally manipulated proteins have been utilized to study biology and develop new
therapeutics. Recent reviewers by us and others have summarized many details of this
technology.[22, 23] Interested readers should refer to those indicated references. Here we
only briefly describe the technology, link it with similar natural systems, focus on the re-
engineering of components other than tRNAs and synthetases, and finally highlight its
applications on therapeutics and vaccines.
Suppressor tRNAs for termination codons had been widely found in nature, so it was quite
straightforward to propose a similar method to incorporate unnatural amino acids.[11]
Initially, this was done in vitro using suppressor tRNAs pre-charged with unnatural amino
acids, and in vivo by directly injecting charged tRNAs.[26, 27] Those charged tRNA
8
molecules were made through either in vitro enzymatic reactions or methods that include
organic synthesis. Research by Schultz and others established a procedure to genetically
encode most components needed for incorporation of unusual amino acids (Figure 1.3).[28,
29] The technology is often referred to as “genetic code expansion”, and has been widely
adapted by the research community.
Figure 1.3. Schematic diagram of genetic encoding of unnatural amino acids in
living cells.
In a typical experiment, a pre-engineered orthogonal tRNA with its anticodon
complementary to a stop codon or a 4-based codon, and an also pre-engineered aminoacyl
tRNA synthetase with preference toward the unnatural amino acid, are recombinantly
expressed in cells. The unnatural amino acid is supplemented in the culture media. The
resulting cells are capable to link the amino acid with the suppressor tRNA and synthesize
modified proteins containing site-specifically inserted unnatural amino acids. This method
9
is compatible with living cells, so it has become an indispensable tool for life science
research. It is also an efficient and economical way to produce a large amount of nonnative
proteins. Currently, the technology is available for genetic encoding of more than 90
unnatural amino acids harboring various reactive conjugation handles, photoactive
functional groups, pre-installed post-translational modifications (PTMs), fluorophores,
metal-chelating functional groups and other useful side chains.[22, 23]
It is challenging to identify a pair of tRNA and synthetase orthogonal to cell endogenous
pathways, and engineer them to gain selective activity toward a novel unnatural amino
acid. In practice, orthogonal tRNA/synthetase pairs used in one organism are often derived
from another organism in a different domain of life. For example, the tyrosyl tRNA and
tyrosyl-tRNA synthetase pair from the archaeal Methanocaldococcus jannaschii
(MjTyrRS/MjtRNATyr) can be used in bacterial E. coli and Mycobacterium tuberculosis
(MTB), while pairs derived from the E. coli tyrosyl tRNA and synthetase
(EcTyrRS/EctRNATyr) have been used for genetic encoding of unnatural amino acids in
eukaryotic cells.[28, 29] Many other important pairs for eukaryotic uses are derived from the
E. coli leucyl tRNA and synthetase (EcLeuRS/EctRNALeu). In addition, pyrrolysyl tRNAs
and pyrrolysyl-tRNA synthetases (PylRS/tRNAPyl) from Methanosarcina barkeri and
Methanosarcina mazei, are orthogonal in both prokaryotic and eukaryotic organisms, and
have been engineered to encode many useful amino acids.[30]
10
The anticodons of these suppressors have been switched so that they can pair with nonsense
or 4-base codons. The first three bases of a 4-base codon need to be a less-used codon in
the target organism (the corresponding endogenous tRNA is less abundant). In addition,
wild-type synthetases have to be mutated to switch their substrate specificity from native
amino acids to unnatural amino acids. Usually, rounds of positive and negative selections
are performed. Briefly, synthetase libraries targeting at amino acid-binding residues are
created by molecular biology. Both the tRNA and the synthetase mutants are imported into
the organism cultured with media containing the supplemented unnatural amino acid. A
gene necessary for cell survival under the given selection condition is induced for
expression. However, nonsense or 4-base codons have been pre-inserted into its sequence.
Only if a synthetase mutant can charge the tRNA with the unnatural amino acid to suppress
nonsense or 4-base codons, cells would survive. Survivals from the positive selection will
be subjected to a negative selection step, in which a toxic gene containing nonsense or 4-
base codons will be expressed. No unnatural amino acid is provided in the negative
selection step. Cells containing any synthetase mutant charging the tRNA with cell
endogenous amino acids would be killed. The selection is often performed for multiple
cycles to enrich synthetase mutants selective for the corresponding unnatural amino
acid.[23]
1.1.3 Engineering of Ribosome and Other Related Components
Suppression of nonsense and four-base codons is not very efficient. Recombinantly
expressed and then charged orthogonal tRNAs has to compete with cell endogenous factors
11
(Figure 1.4), i.e. translation termination factors (peptide release factors) or charged
endogenous tRNAs that decode the first three bases of a four-base codon. Therefore, the
yield of full-length proteins containing unnatural amino acids is often low. This problem
is further amplified when multiple unusual codons are present in a single gene. Recent
work has attempted to solve the problem by targeting individual or multiple steps involved
in protein translation. For example, the interaction interface between the suppressor tRNA
derived from MjtRNATyr and the E. coli elongation factor Tu (EF-Tu) has been re-
engineered.[31] The improved tRNAs have been used to construct a series of pEvol plasmids
showing robust amber suppression efficiency in E. coli cells.[32] We and others are currently
performing similar work in yeast and mammalian cells to improved amber suppression in
eukaryotic systems. Besides tRNAs and synthetases, other machineries involved in protein
translation, such as ribosome and other translational factors, have also been targeted. The
purpose of those studies is to improve the efficiency of nonnative protein production,
and/or enable the incorporation of unusual amino acids whose encoding is otherwise
impossible.
12
Figure 1.4. The competition between amber (TAG) codon suppression and RF-1
induced translation termination.
Elongation factors are critical enzymes involved in protein synthesize. Suppressor tRNAs
carrying large nonnative amino acids are less tightly bound to elongation factor Tu (EF-
Tu) than natural amino acids. Sisido et al. re-engineered the EF-Tu binding pocket for
aminoacyl moieties of aminoacyl-tRNAs to increase its affinity toward large amino
acids.[33, 34] Several bulk aromatic amino acids, which are hardly or only slightly
incorporated by the wild-type EF-Tu, were successfully incorporated into proteins in the
presence of the EF-Tu mutants.
Bacterial release factors (RFs) 1 and 2 catalyze translation termination at either UAG and
UAA, or UAA and UGA, respectively (Figure 1.4). The large ribosomal subunit protein
L11 is a highly conserved protein containing two domains, an N-terminal domain (L11N)
and a C-terminal domain (L11C). L11 interacts with 23S rRNA and plays an important role
in the RF1-mediated peptide release. L11C alone can also bind 23S rRNA. The ribosome,
in which L11C is used to replace the full-length L11, shows translation efficiency
13
comparable to the wild-type ribosome, but has lower efficiency in the RF1-mediated
termination. Liu and his coworkers, therefore, overexpressed L11C in E. coli cell, to reduce
RF1-mediated translation termination and increase amber suppression efficiency.[35] They
demonstrated that three acetyllysine residues could be incorporated into a single peptide in
a reasonable yield.
Sakamoto, Yokoyama and their coworkers engineered an E. coli strain, which lacks RF1
to terminate translation in response to UAG codons.[36] A few genetic modifications were,
however, needed to circumvent the lethality of RF1 deletion. Several genes, which use
UAG as their stop codons, were mutated. In their mutated strain, UAG was able to be
assigned unambiguously to a natural or non-natural amino acid using different UAG-
decoding tRNAs. They also demonstrated that p-iodophenylalanine could be incorporated
in response to six in-frame amber codons in a model glutathione S-transferase (GST)
protein. Similarly, Wang et al. also reported several RF1-deletion E. coli strains.[37] They
found that R1 deletion could be tolerated by E. coli, as long as a certain version of RF2 is
express in cells.[38] They confirmed that the critical residue in RF2 is Ala246. These
reported E. coli strains are, undoubtedly, valuable tools for expression of proteins
containing multiple unnatural amino acids at different residue sites.
To incorporate multiple chemically distinct unnatural amino acids into a single protein,
mutually orthogonal pairs that are also compatible with cell endogenous tRNAs,
synthetases and amino acids are needed. First, Schultz and others reported the use of an
14
MjTyrRS/MjtRNATyr derived tRNA/synthetase pair and another pair derived from
Pyrococcus horikoshii lysyl tRNA and synthetase in response to UAG and AGGA codons,
respectively, for insertion of two different unnatural amino acids.[39] In addition, Liu et al.
used MjTyrRS/MjtRNATyr derived tRNA/synthetase pairs and PylRS/tRNAPyl derived
pairs in the same E. coli cells to decode two nonsense codons (UAG and UAA). Chin and
his coworkers, instead, reported the adaption of two orthogonal pairs directly from
MjTyrRS/MjtRNATyr, one pair responding to UAG and the other responding to
AGGA.[40] Direct use of two nonsense codons, or one nonsense and one four-base codon,
often leads to very low yield of protein production.
An exciting development is made by Chin and co-workers (Figure 1.5).[24] Orthogonal
ribosomes were particularly developed for encoding unnatural amino acids. Briefly, a 16S
rRNA library was built with mutations important for interactions at the ribosomal A site.
The library was screened to identify mutants exhibiting a substantial increase in efficiency
of decoding amber codons. Those mutant 16S rRNAs are likely to reduce the affinity
between RF-1 and ribosome, so peptide releasing in response to UAG codons is reduced.
Next, they engineered the ribosomal small subunit so that the mutated ribosome only binds
a mutated SD sequence. These derived ribosomes can only translate exogenously
introduced mRNAs, which harbor the mutated SD sequence. Endogenous mRNAs are
excluded from the mutant ribosome due to the disrupted translation initiation. In the
meanwhile, the synthesis of cell endogenous proteins is carried out by natural ribosomes.
More recently, Chin et al. further engineered an orthogonal ribosome for improved
15
efficiency in decoding 4-base codons.[25] They showed that the mutant ribosome
maintained its enhanced efficiency in decoding in-frame amber codons. Next, they used
this orthogonal ribosome to synthesize proteins containing two different unnatural amino
acids in response to both UAG and AGGA. One tRNA/synthetase pair was derived from
MjTyrRS/MjtRNATyr, and another pair was derived from PylRS/tRNAPyl. They were
able to generate a GST-calmodulin protein containing both azide and alkyne functional
groups. The protein was subjected to click chemistry to build an intramolecular bridge
through Cu(I)-catalyzed azide/alkyne Huisgen cycloaddition. The research represents an
interesting proof of concept that orthogonal ribosomes may be possibly re-engineered to
reassign triplet and quadruplet codons. Research toward this direction is likely to establish
biosynthetic pathways for polymers made with artificial building blocks.
Figure 1.5. Protein synthesis in E. coli using (A) a wild-type ribosome and (B) an
engineered orthogonal ribosome.
16
O-Phosphoserine (Sep) is an abundant posttranslational protein modification. Recently,
Söll and coworkers reported a method to synthesize homogenous Sep-containing proteins
in genetically modified E. coli.[41] Naturally, in some methanogenic archaea, there is no
cysteinyl-tRNA synthetase. Instead, a Sep specific synthetase (SepRS) catalyzes the
formation of the linkage between the amino acid O-phosphoserine and the corresponding
cysteinyl-tRNA (tRNACys). The O-phosphoserine charged tRNACys has low affinity with
EF-Tu. It is subsequently converted to cysteine by the enzyme SepCysS in the presence of
a sulfide donor. Next, Cys-tRNACys is used by ribosome for protein synthesis. Söll et al.
engineered a new amber suppressor from tRNACys by converting its anticodon to CUA
(pair with UAG). An additional C20U mutation was made to improve the aminoacylation
efficiency. It is worth noting that SepRS is not cross-reactive with any E. coli endogenous
tRNA and can be overexpressed in E. coli cells. E. coli has a Sep-compatible transporter,
so Sep was directly added to the growth medium. The E. coli endogenous phosphoserine
phosphatase gene, serB, was deleted to maintain adequate intracellular Sep concentration.
Furthermore, a new EF-Tu was engineered and recombinantly expressed to increase its
affinity. The engineered strain, which harbors a Sep-accepting transfer RNA, a cognate
Sep-tRNA synthetase (SepRS), and an engineered EF-Tu (EF-Sep), was successfully
utilized to synthesize the phosphorylated active form of human mitogen-activated ERK
activating kinase 1 (MEK1). This research has built a new avenue to biosynthesize
phosphoproteins for detailed studies of their biological properties.
17
To date, excluding tRNAs and synthetases, efforts to re-engineer protein synthesis-related
components have been limited to E. coli. It remains to be determined whether similar
strategies can be extended to eukaryotic (yeast and mammalian) cells and other industrial
microbial strains for applications in biotechnology and pharmaceuticals.
1.1.4 Future Directions
Biomolecular engineering of protein translation-related machinaries has now provided the
ability to genetically encoding more than 90 unnatural amino acids. The early research was
inspired directly by natural nonsense suppressors. Identification of orthogonal
tRNA/synthetase pairs, including tyrosyl-pairs and pyrrolysyl pairs, spurred the research
field. Further engineering on ribosome and translational factors improved and enhanced
the technology for better yields and broader applications. However, most engineering still
remains in E. coli cells. Further research is needed for yeast and mammalian cells, in which
incorporation efficiency of unnatural amino acids is much lower. In addition, further
demonstrations of using those unnatural amino acids haven’t been explored extensively.
Therefore, in this thesis, three different projects involving using photocaged unnatural
amino acids to manipulate living cell system, unnatural amino acid based new drug
development strategy and computational method for unnatural amino acid incorporation
would be presented. I hope all the three demonstrations would further broaden the ability
of this technology, which is expected to eventually help elucidate new biology and develop
new therapeutics and vaccines.
18
References:
[1] Crick F. Central Dogma of Molecular Biology. Nature.1970;227(5258):561-3.
[2] Ramakrishnan V. Ribosome Structure and the Mechanism of Translation. Cell.
2002;108(4):557-72.
[3] Chen H, Bjerknes M, Kumar R, Jay E. Determination of the optimal aligned spacing
between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia
coli mRNAs. Nucleic Acids Res. 1994 Nov 25;22(23):4953-7.
[4] Preiss T, Hentze MW. Dual function of the messenger RNA cap structure in
poly(A)-tail-promoted translation in yeast. Nature. 1998;392(6675):516-20.
[5] Frolova LY, Merkulova TI, Kisselev LL. Translation termination in eukaryotes:
polypeptide release factor eRF1 is composed of functionally and structurally distinct
domains. RNA. 2000;6(3):381-90.
[6] Korostelev A, Noller HF. The ribosome in focus: new structures bring new insights.
Trends Biochem. Sci. 2007;32(9):434-41.
[7] Berk V, Cate JH. Insights into protein biosynthesis from structures of bacterial
ribosomes. Curr. Opin. Struct. Biol. 2007;17(3):302-9.
[8] Schmeing TM, Ramakrishnan V. What recent ribosome structures have revealed
about the mechanism of translation. Nature. 2009;461(7268):1234-42.
[9] Jukes TH, Osawa S. Evolutionary changes in the genetic code. Comp. Biochem.
Physiol. B. 1993;106(3):489-94.
[10] Knight RD, Landweber LF, Yarus M. How mitochondria redefine the code. J. Mol.
Evol. 2001;53(4-5):299-313.
19
[11] Murgola EJ. tRNA, suppression, and the code. Annu. Rev. Genet. 1985;19:57-80.
[12] Srinivasan G, James CM, Krzycki JA. Pyrrolysine encoded by UAG in Archaea:
charging of a UAG-decoding specialized tRNA. Science. 2002;296(5572):1459-62.
[13] Hao B, Gong W, Ferguson TK, James CM, Krzycki JA, Chan MK. A new UAG-
encoded residue in the structure of a methanogen methyltransferase. Science.
2002;296(5572):1462-6.
[14] Gaston MA, Zhang L, Green-Church KB, Krzycki JA. The complete biosynthesis
of the genetically encoded amino acid pyrrolysine from lysine. Nature.
2011;471(7340):647-50.
[15] Cellitti SE, Ou W, Chiu H-P, Grunewald J, Jones DH, Hao X, et al. D-Ornithine
coopts pyrrolysine biosynthesis to make and insert pyrroline-carboxy-lysine. Nat. Chem.
Biol. 2011;7(8):528-30.
[16] Chambers I, Frampton J, Goldfarb P, Affara N, McBain W, Harrison PR. The
structure of the mouse glutathione peroxidase gene: the selenocysteine in the active site is
encoded by the 'termination' codon, TGA. EMBO J. 1986;5(6):1221-7.
[17] Bock A, Forchhammer K, Heider J, Leinfelder W, Sawers G, Veprek B, et al.
Selenocysteine: the 21st amino acid. Mol. Microbiol. 1991;5(3):515-20.
[18] Copeland PR. Making sense of nonsense: the evolution of selenocysteine usage in
proteins. Genome Biol. 2005;6(6):221.
[19] Yuan J, Palioura S, Salazar JC, Su D, O'Donoghue P, Hohn MJ, et al. RNA-
dependent conversion of phosphoserine forms selenocysteine in eukaryotes and archaea.
Proc. Natl. Acad. Sci. USA. 2006;103(50):18923-7.
20
[20] Berry MJ, Banu L, Harney JW, Larsen PR. Functional characterization of the
eukaryotic SECIS elements which direct selenocysteine insertion at UGA codons. EMBO
J. 1993;12(8):3315-22.
[21] Farabaugh PJ. Translational frameshifting: implications for the mechanism of
translational frame maintenance. Prog. Nucleic Acid Res. Mol. Biol. 2000;64:131-70.
[22] Ai HW. Biochemical analysis with the expanded genetic lexicon. Anal. Bioanal.
Chem. 2012;403(8):2089-102.
[23] Liu CC, Schultz PG. Adding new chemistries to the genetic code. Annu. Rev.
Biochem. 2010;79:413-44.
[24] Wang K, Neumann H, Peak-Chew SY, Chin JW. Evolved orthogonal ribosomes
enhance the efficiency of synthetic genetic code expansion. Nat. Biotechnol.
2007;25(7):770-7.
[25] Neumann H, Wang K, Davis L, Garcia-Alai M, Chin JW. Encoding multiple
unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature.
2010;464(7287):441-4.
[26] Shimizu Y, Inoue A, Tomari Y, Suzuki T, Yokogawa T, Nishikawa K, et al. Cell-
free translation reconstituted with purified components. Nat. Biotech. 2001;19(8):751-5.
[27] Saks ME, Sampson JR, Nowak MW, Kearney PC, Du F, Abelson JN, et al. An
engineered Tetrahymena tRNAGln for in vivo incorporation of unnatural amino acids into
proteins by nonsense suppression. J. Biol. Chem. 1996;271(38):23169-75.
[28] Wang L, Brock A, Herberich B, Schultz PG. Expanding the genetic code of
Escherichia coli. Science. 2001;292(5516):498-500.
21
[29] Chin JW, Cropp TA, Anderson JC, Mukherji M, Zhang Z, Schultz PG. An
Expanded Eukaryotic Genetic Code. Science. 2003;301(5635):964-7.
[30] Chen PR, Groff D, Guo J, Ou W, Cellitti S, Geierstanger BH, et al. A facile system
for encoding unnatural amino acids in mammalian cells. Angew. Chem. Int. Ed.
2009;48(22):4052-5.
[31] Guo J, Melancon CE, 3rd, Lee HS, Groff D, Schultz PG. Evolution of amber
suppressor tRNAs for efficient bacterial production of proteins containing nonnatural
amino acids. Angew. Chem. Int. Ed. 2009;48(48):9148-51.
[32] Young TS, Ahmad I, Yin JA, Schultz PG. An enhanced system for unnatural amino
acid mutagenesis in E. coli. J. Mol. Biol. 2010;395(2):361-74.
[33] Nakata H, Ohtsuki T, Abe R, Hohsaka T, Sisido M. Binding efficiency of
elongation factor Tu to tRNAs charged with nonnatural fluorescent amino acids. Anal.
Biochem. 2006;348(2):321-3.
[34] Doi Y, Ohtsuki T, Shimizu Y, Ueda T, Sisido M. Elongation factor Tu mutants
expand amino acid tolerance of protein biosynthesis system. J. Am. Chem. Soc.
2007;129(46):14458-62.
[35] Huang Y, Russell WK, Wan W, Pai PJ, Russell DH, Liu W. A convenient method
for genetic incorporation of multiple noncanonical amino acids into one protein in
Escherichia coli. Mol. Biosyst. 2010 Apr;6(4):683-6.
[36] Mukai T, Hayashi A, Iraha F, Sato A, Ohtake K, Yokoyama S, et al. Codon
reassignment in the Escherichia coli genetic code. Nucleic Acids. Res. 2010;38(22):8188-
95.
22
[37] Johnson DB, Xu J, Shen Z, Takimoto JK, Schultz MD, Schmitz RJ, et al. RF1
knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nat.
Chem. Biol. 2011;7(11):779-86.
[38] Johnson DB, Wang C, Xu J, Schultz MD, Schmitz RJ, Ecker JR, et al. Release
Factor One Is Nonessential in Escherichia coli. ACS Chem. Biol. 2012;7(8):1337-44.
[39] Anderson JC, Wu N, Santoro SW, Lakshman V, King DS, Schultz PG. An
expanded genetic code with a functional quadruplet codon. Proc. Natl. Acad. Sci. USA.
2004;101(20):7566-71.
[40] Neumann H, Slusarczyk AL, Chin JW. De novo generation of mutually orthogonal
aminoacyl-tRNA synthetase/tRNA pairs. J. Am. Chem. Soc. 2010;132(7):2142-4.
[41] Park HS, Hohn MJ, Umehara T, Guo LT, Osborne EM, Benner J, et al. Expanding
the genetic code of Escherichia coli with phosphoserine. Science. 2011;333(6046):1151-4.
23
Chapter 2: Light Activation of Protein
Splicing with a Photocaged Intein
2.1 Introduction
Inteins are protein elements that are capable of excising themselves and subsequently
splicing adjacent N- and C-terminal extein flanks to form a new truncated peptide.[1] These
naturally occurring, self-catalyzing protein-splicing elements have been adapted to achieve
efficient protein purification, ligation, labeling, cyclization, cleavage, and patterning.[2, 3]
In particular, conditional inteins, whose activities are inducible by additional factors, such
as small molecules, light, or changes in temperature, pH, or redox states, have previously
been utilized to regulate protein activities in vitro and in vivo.[4, 5] Photoactivatable inteins
are of particular interest because light-based approaches often have sufficient spatial and
temporal resolution to meet the need of understanding biology at the cellular and
subcellular levels.[6] In a previous work, Noren et al. reported the in vitro preparation of a
photoactivatable Thermococcus litoralis (Tli) Pol-2 intein, using a chemically amino-
acylated suppressor tRNA.[7] Furthermore, chemical synthetic methods have also been
employed to integrate photo-cleavable functional groups into the O-acyl isomer,[8] the
peptide backbone,[9] or the N-terminus[10] of split inteins to achieve photo-controlled
protein splicing. Due to the difficulty of directly delivering proteins or peptides into living
cells, these studies focused on in vitro applications. In another work, two photo-responsive
dimerization domains were each fused to an artificially split intein fragment as a genetically
24
encoded system to control protein splicing in living Saccharomyces cerevisiae cells, but
the system was not adaptable to mammalian cells.[11] Herein, we report the genetic
encoding of a photoactivatable intein and its applications in directly controlling primary
structures of proteins and therefore their functions, in living mammalian cells.
The Nostoc punctiforme (Npu) DnaE intein is among the most well-characterized and
efficient inteins, with a splicing reaction half-life of ∼60 s at 37 °C.[12, 13] The Npu DnaE
intein is also compatible with a myriad of flanking extein sequences.[14] All these features
make the Npu DnaE intein an ideal research tool, especially for mammalian studies.
Mutagenesis of the first catalytic cysteine residue within the Npu DnaE intein to alanine
(Cys/Ala) abrogates protein splicing and auto-cleavage at both intein domain ends.[12, 15]
This property is different from that of some other recently reported fast inteins, whose
Cys/Ala mutants are efficient in undergoing the C-terminal cleavage reaction.[16]
The genetic code expansion technology is capable of introducing site-specific photocaged
lysine, tyrosine, serine, and cysteine residues into proteins of interest in living systems,
including bacterial, yeast, and mammalian cells.[17-21] Previously, optical control of
enzymatic activities[22-24], ion channels[25], gene expression and silencing[26], and protein
translocation[27, 28] have been demonstrated by replacing critical protein residues with
photocaged unnatural amino acids (UAAs). In this study, we show that a genetically
encoded photoactivatable intein can be readily derived by replacing the Cys1 residue of
Npu DnaE intein with a photocaged cysteine, and it is highly effective in directly
25
modulating primary protein structures, thereby rendering a general approach for
controlling protein activities in living cells.
2.2 Materials and Methods
2.2.1 Materials
All chemicals were purchases from Sigma-Aldrich (St. Louis, MO) or Alfa Aesa (Ward
Hill, MA). Synthetic DNA oligonucleotides were purchased from Integrated DNA
Technologies (IDT; San Diego, CA). Restriction endonucleases were purchased from New
England Biolabs (Ipswich, MA) or Thermo Fisher Scientific Fermentas (Vilnius,
Lithuania). PCR and restriction digest products were purified by gel electrophoresis and
extracted using the Syd Labs Gel Extraction kit (Malden, MA). Syd Labs Mini-prep kit
was used for plasmid purification. DNA sequence analysis was performed by the Genomics
Core at the University of California, Riverside (UCR; Riverside, California). Protein mass
spectrometry was performed at the UCR High Resolution Mass Spectrometry Facility.
Plasmids encoding the Npu DnaE intein (Addgene # 41684) and Src (Addgene # 23934)
were purchased from Addgene (Cambridge, MA). The Src kinase sensor was a gift from
Prof. Yingxiao Wang at the University of California, San Diego (San Diego, California).
26
2.2.2 Chemical Preparation of Photocaged Cysteines
Scheme 2.1. Synthetic route to prepare photocaged cysteine (2).
2.2.2.1 Chemical Preparation of (R,S) 1-(1-Bromoethyl)-4,5-
dimethoxy-2-nitrobenzene (6)
Compound 4 (900 mg, 4 mmol) in scheme 1 prepared from compound 3 according to the
literature, was dissolved in THF/EtOH (1:1,15 mL) at room temperature; followed by
intermittent addition of NaBH4 (152 mg, 4 mmol) over 20 min. After stirring the reaction
mixture for another 3 hour, diluted HCl (1 mol/L, 4 mL) was added to neutralize excess
NaBH4. The solvent was then removed in vacuo, and H2O (10 mL) was subsequently
added to the residue. The mixture was extracted three times with CH2Cl2 (10 mL). The
combined organic layer was dried over anhydrous Na2SO4 and further concentrated to
27
afford crude 5 as a yellow solid, which was then used directly without further purification.
Compound 5 dissolved in CH2Cl2 (20 mL) was cooled in ice bath. PBr3 (475 µL, 5 mmol)
was introduced dropwise. The reaction mixture was stirred for another 3 hour before
saturated NaHCO3 aqueous solution (15 mL) was added. The organic layer was separated,
washed twice with H2O (10 mL), and further dried over anhydrous Na2SO4. The solvent
was removed in vacuo to afford crude compound 6 as yellow oil. The crude product was
purified by silica chromatography (EtOAc/Hexane 1:4) to obtain pure compound 6 as
yellow oil (810 mg, 2.79 mmol). The yield was 69% over two steps.
2.2.2.2 Chemical Preparation of N-(tert-butoxycarbonyl)-S-[(R,S)-
1-{4',5'-dimethoxy-2'-nitrophenyl}ethyl]- L-cysteine (7)
L-Cysteine (0.36 g, 3 mmol) was dissolved in 5 mL of deionized water and then
neutralized by triethylamine (405 µL, 2.8 mmol). The solution was cooled in ice/water
bath. Next, compound 6 (2.79 mmol in 5 mL of methanol) was added dropwise over 15
min. The reaction mixture was stirred overnight. The yellow precipitation was collected.
The filtrate was washed twice with CH2Cl2 (10 mL). The aqueous layer and the yellow
precipitation were combined followed by addition of saturated NaHCO3 aqueous solution
(2 mL) and (Boc)2O (654 mg, 3 mmol). The reaction mixture was allowed to stir for
another 3 hour. Next, it was acidified with HCl (1 mol/L, 5 mL) and extracted with CH2Cl2
(10 mL) three times. The organic layer was combined and dried over anhydrous Na2SO4.
The solvent was removed in vacuo to yield crude compound 7 as yellow oil. The crude
28
product was purified by silica chromatography (EtOAc/Hexane 2:1) to obtain pure
compound 7 as yellow oil (620 mg, 1.44 mmol). The yield was 52%.
2.2.2.3 Chemical Preparation of S-[(R,S)-1-{4',5'-Dimethoxy-2'-
nitrophenyl}ethyl]-L-cysteine (2)
Compound 7 (142 mg, 0.33 mmol) was dissolved in dioxane (3 mL), and next,
concentrated HCl (1 mL) was introduced. The solution was stirred for 2 hour at room
temperature. The solvent was removed in vacuo to afford compound 7 quantitatively as a
yellow solid.
2.2.3 Plasmid Constructions
In order to achieve the genetic encoding of photocaged cysteines, a plasmid
pMAH2CagCys was constructed for the mammalian expression of the corresponding
tRNA and aminoacyl-tRNA synthetase. The gene encoding the aminoacyl-tRNA
synthetase (E. coli leucyl-tRNA synthetase with M40G, L41Q, Y499L, Y527G, H537F
mutations) was codon-optimized for mammalian expression and chemically synthesized
by IDT. The gene fragment encoding an H1 promoter and the tRNA was also chemically
synthesized. One copy of the synthetase gene was amplified with oligonucleotides
CAGCYS-F and CAGCYS-R, digested with Hind III and Apa I, and inserted into a
previously reported pMAH plasmid. A successful clone identified by DNA sequencing
served as the PCR template in a reaction using oligonucleotides pMAH-tRNA1-F and
pMAH-tRNA2-R. The PCR reaction amplified the whole plasmid and appended Spe I and
29
Xho I restriction sites to the ends of the DNA product. Next, the gene fragment encoding
the H1 promoter and the tRNA was amplified by oligonucleotides tRNA-F and tRNA-R.
tRNA-F and tRNA-R installed Spe I and Sal I restriction sites to the ends of the DNA
product. The above two DNA fragments were digested with Spe I and Xho I, and Spe I and
Sal I, respectively. Since Xho I and Sal I generate compatible ends, the above two
fragments were ligated to afford a complete plasmid. An additional Xho I site was designed
upstream to the H1 promoter. Thus, the resulting plasmid was able to be re-digested with
Spe I and Xho I to insert the second H1-tRNA fragment. This procedure was repeated to
generate a pMAH2-CageCys plasmid containing 3 copies of H1-tRNA and 1 copy of the
synthetase.
Figure 2.1. Plasmid map of pMAH2-CageCys
30
To construct the intein/mCherry fusion, oligonucleotides IC1 and IC2 were used to amplify
the N-terminal portion of mCherry. IC3 and IC4 were used to amplify the Npu DnaE intein
from the plasmid pSKDuet16 (Addgene # 41684) and mutate the codon of Cys1 to TAG.
IC5 and IC6 were used to amplify the C-terminal portion of mCherry. The three pieces
were fused together by overlap extension PCR using IC1 and IC6. The product was
digested with Hind III and Xho I and inserted into a pre-digested compatible pcDNA3
plasmid.
Figure 2.2. (a) X-ray crystal structure of mCherry (redrawn from PDB 2H5Q). The
chromophore (magenta) and residues 138 and 139 are shown as ball
representations. (b) The primary sequence of the photocaged intein/mCherry
chimeric protein. The asterisk (*) represents the UAA 2 incorporation site. The
photo-activated protein splicing product is expected to be mCherry, containing two
mutations at residues 138 and 139.
31
To construct the intein/Src fusions, a similar overlap extension PCR strategy was utilized.
The three fused DNA fragments were digested with Hind III and EcoR I and inserted into
a pre-digested compatible pcDNA3 plasmid. In addition, the full-length mCherry was
amplified with oligonucleotides ECORI-RFP-F and IC6, treated with appropriate
restriction enzymes, and inserted between EcoR I and Xho I restriction sites of the
pcDNA3-derived plasmids. Constructed plasmids were confirmed by DNA sequencing.
Figure 2.3. (a) X-ray crystal structure of the human Src kinase catalytic domain
(redrawn from PDB 1FMK). Residues 277, 342 and 400 are shown as ball
representations. (b) The primary sequence of the Src kinase catalytic domain fused
to mCherry. Residues 277, 342 and 400 are colored in magenta. The photocaged
intein was inserted upstream of these residues. Ser342 was mutated to cysteine,
since the Npu DnaE intein requires a +1 site cysteine for efficient protein splicing.
32
Oligonucleotides used for plasmids construction are listed below:
CAGCYS-F: CACATGAAGCTTGCCACCATGCAAG
CAGCYS-R: TAATATGGGCCCTTAGCCCACGAC
pMAH-tRNA1-F: TTATTGACTAGTTATTAATAGTAATCAATTACGGGGTC \
pMAH-tRNA2-R: ATAACTCGAGTCGGGGAAATGTGC
tRNA-F: GCCATCACTAGTCAATAATCAATGC
tRNA-R: ACTCGTGTCGACCTCGACTCAAAAAAAGGACTACCCGGAGCGGGA
IC1: TACTAAGCTTGCCACCATGGTGAGCAAGGGCGAG
IC2: ATAGCTTAACTACTGCATTACGGGGCCGTCGGA
IC3: GTAATGCAGTAGTTAAGCTATGAAACGGAAATA
IC4: GGTCATACAATTAGAAGCTATGAAGCCATT
IC5: ATAGCTTCTAATTGTATGACCATGGGCTGGGAGGCC
IC6: ATTCCTCGAGTTAATGGTGGTGATGGTGGTGCTTGTACAGCTCGTCCAT
SRC-F: CTGTAAGCTTGCCACCATGTCCAAACACGCCGATGGCCTG
IS-1-1-F: GTCAAGCTGGGCCAGGGCTAGTTAAGCTATGAAACGGAA
IS-1-1-R: TTCCGTTTCATAGCTTAACTAGCCCTGGCCCAGCTTGAC
IS-1-2-F: TTCATAGCTTCTAATTGCTTTGGCGAGGTGTGG
IS-1-2-R: CCACACCTCGCCAAAGCAATTAGAAGCTATGAA
IS-2-1-F: ATCGTCACGGAGTACATGTAGTTAAGCTATGAAACGGAA
IS-2-1-R: TTCCGTTTCATAGCTTAACTACATGTACTCCGTGACGAT
IS-2-2-F: TTCATAGCTTCTAATTGCAAGGGGAGTTTGCTGGAC
33
IS-2-2-R: GTCCAGCAAACTCCCCTTGCAATTAGAAGCTATGAA
IS-3-1-F: GTGGGAGAGAACCTGGTGTAGTTAAGCTATGAAACGGAA
IS-3-1-R: TTCCGTTTCATAGCTTAACTACACCAGGTTCTCTCCCAC
IS-3-2-F: TTCATAGCTTCTAATTGCAAAGTGGCCGACTTT
IS-3-2-R: AAAGTCGGCCACTTTGCAATTAGAAGCTATGAA
SRC-R: TTTTGAATTCGAGGTTCTCCCCGGGCTGGTACTG
ECORI-RFP-F: ATAAGAATTCGTGAGCAAGGGCGAGGAGGAT
2.2.4 Mammalian Cell Culture and Transfection
HEK 293T cells were maintained in T25 flasks with 5 mL Dulbecco’s Modified Eagle’s
Medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and incubated at
37°C with 5% CO2 in humidified air. Cells at 80% confluence were passaged into 35-mm
or 100-mm culture dishes in a ratio of 1:10 or 1:20 for following transfection. In the next
day, transfection complexes were prepared by mixing DNA and PEI (polyethylenimine,
linear, 25 kD) (DNA:PEI (w/w) = 1:2.5) in Opti-MEM. For 35-mm culture dish, 10 µL PEI
(1 µg/µL) was used to prepare 500 µL transfection media. For 100-mm culture dishes, 60
µL PEI (1 µg/µL) was added to 2 mL Opti-MEM. To express the intein/mCherry fusion,
pcDNA3 and pMAH2-CagCys were used in a 1:1 ratio. To express intein/Src fusions,
pcDNA3, pMAH2-CagCys and the KRas Src sensor were used in a 1:1:0.25 ratio. After
preparing transfection complexes, cells were soaked with transfection media for 2 hours.
Next, pre-warmed fresh culture media were added to replace the transfection media. For
34
positive samples, all transfected cells were cultured in media containing 1 mM of the
photocaged cysteine 2, while no UAA was used for negative control samples.
2.2.5 Analysis of Intein-Mediated Splicing of mCherry
After transfection, cells were cultured for another 4 days. Fresh media were added every 2
days. After removing the culture media, cells in culture dishes sitting on ice were directly
illuminated with UVA light (365 nm radiation of 600 µW/cm2, Black Ray Lamp, Model
XX-20BLB, VWR, cat. no. 21474-676) for 10 min. Cells were left in dark in DMEM
containing 10% FBS at 37°C for 1 hour for protein splicing and mCherry chromophore
maturation. Cells were imaged under a Leica SP5 confocal fluorescence microscope. The
excitation laser was set at 488 nm, and emission was collected from 500 nm to 550 nm.
To analyze proteins with SDS-PAGE, cells were collected and lysed in RIPA (radio-
immunoprecipitation assay) buffer directly after the 10-min irradiation. The mixtures were
sonicated for 5 seconds. Cell lysates were centrifuged at 13,000xg for 5 min at 4°C. The
supernatants were collected for 6xHis-tagged protein purification. Ni-NTA agarose
(Qiagen) was used, according to the protocol provided by the manufacturer for native
conditions. The components of Wash Buffer are 30 mM imidazole, 150 mM NaCl and 50
mM NaH2PO4 with pH adjusted to 8. The components of Elution Buffer are 300 mM
imidazole, 150 mM NaCl and 50 mM NaH2PO4 with pH adjusted to 8. Purified proteins
were analyzed on a 15% SDS-PAGE gel. The control protein sample was prepared in
35
parallel from the same amount of cells that were equally treated except for no UV
irradiation.
2.2.6 Analysis of Intein-Mediated Splicing of Src
After transfection, cells were cultured for 4 days. Fresh media were added every 2 days.
After removing the culture media, cells in culture dishes sitting on ice were directly
illuminated with UVA light (365 nm radiation of 600 µW/cm2, Black Ray Lamp, Model
XX20BLB, VWR, cat. no. 21474-676) for 10 min. Cells were collected and lysed
immediately in RIPA buffer. The mixtures were sonicated for 5 seconds. Cell lysates were
centrifuged at 13,000xg for 5 min at 4oC. The supernatants were directly used for
fluorescence measurements. A mono-chromator-based Synergy Mx Microplate Reader
(BioTek, Winooski, VT) was used to record all spectra. To record the fluorescence
emission spectra, the excitation wavelength was set at 430 nm, and the emission scanned
from 450 nm to 600 nm. The Förster resonance energy transfer (FRET) ratio was calculated
by dividing the emission at 530 nm by the emission at 480 nm.
To inhibit protein synthesis during and after UV illumination in our control experiments,
cycloheximide (100 µg/ml) was added into cell culture media 1 h before the light treatment,
and also into the RIPA buffer. Cells were otherwise treated identically, and the same
experimental procedure was used to quantitatively measure fluorescence ratios.
36
2.2.7 Photoactivation of Src and Fluorescence Microscopic Imaging
After transfection, cells were cultured for 3 days. Before imaging, the cells were switched
into Dulbecco’s Phosphate Buffered Saline (DPBS) containing 1 mM Ca2+ and 1 mM
Mg2+. The experiments were done with a Motic AE31 inverted epi-fluorescence
microscopy with home-built FRET imaging ability. Photoactivation was carried out with a
DAPI excitation filter (377 nm/50 nm, Iridian Part # FEX000003). Regions of interest were
illuminated for 2 min (~ 4 mW/cm2). Next, time-lapse imaging was performed for 30 min.
The excitation filter was 436 nm/20 nm. The emission filters were 480 nm/40 nm and 535
nm/50 nm. The imaging results were analyzed using ImageJ according to a protocol
published previously.
2.2.8 Mass Spectrometry Analysis of Proteins
Proteins (40 µg) were precipitated in methanol/chloroform. The pellet was dissolved in
acetonitrile and ddH2O (1:1) mixture (30 µL) containing 1% formic acid. A direct infusion
mode was used to record mass spectra on an Agilent ESI-TOF instrument at the Analytical
Chemistry Instrumentation Facility of UCR.
2.3 Results
Previous efforts have utilized mutant pairs of pyrrolysyl tRNA synthetase
(PylRS)/tRNA[29, 30] and Escherichia coli leucyl tRNA synthetase (EcLeuRS)/tRNA[25] in
mammalian cells for the genetic encoding of unnatural cysteine derivatives that can be
decaged with long-wavelength UVA radiation. In particularly, an orthogonal
37
EcLeuRS/tRNA pair originally engineered for the encoding of a photocaged serine in
yeast[19] was found to be capable of encoding a photocaged cysteine (1 in Figure 2.4a) in
mammalian cells.[25] Based on these results, we modified our pMAH mammalian
expression plasmid[31] to express the mutant EcLeuRS and tRNA genes. Expression of the
full-length GFP protein in Human Embryonic Kidney (HEK) 293T cells bearing EGFP-
Tyr39TAG (a gene for enhanced green fluorescent protein with an amber codon at residue
39) was observed to be dependent on 1 (Figure 2.4b). Photolysis of 1 is expected to generate
an aldehyde byproduct, which may further react with free cellular amines to inadvertently
promote cell toxicity (Figure 2.5a).[32] Therefore, we also prepared a new UAA, 2 (Figure
2.4a), photolysis of which yields a cysteine and a less reactive ketone byproduct (Figure
2.5bc). Since 2 is structurally similar to 1, we also tested 2 for amber suppression in the
presence of the mutant EcLeuRS/tRNA pair. We achieved an appreciable yield of full-
length GFP from HEK 293T cells, as observed by SDS-PAGE analysis and fluorescence
microscopic imaging (Figure 2.4b and c). Electrospray ionization mass spectrometry (ESI-
MS) further confirmed the genetic incorporation of 2 in the re-combinantly expressed
EGFP (Figure 2.6).
38
Figure 2.4. Genetic encoding of photocaged cysteines in HEK 293T cells. (a)
Chemical structures of two photocaged cysteines, 1 and 2. (b) SDS-PAGE analysis
of Ni-NTA-purified EGFP, containing 1 or 2, expressed in HEK 293T cells. (c)
Microscopic imaging of EGFP expressing HEK 293T cells in the absence (left
column) or presence (right column) of 2 (scale bar: 50 μm).
39
Figure 2.5. Photolysis of photocaged cysteines, 1 and 2, yields a cysteine and either
(a) an aldehyde, or (b) a ketone by-product. (c) Electrospray ionization (ESI) mass
spectrum of 2 briefly exposed to long-wavelength UVA light, showing the formation
of a ketone byproduct.
40
Figure 2.6. ESI mass spectrometry analysis of intact proteins. (a) Mass spectrum of
EGFP, containing 1 at residue 39 (calculated mass: 28817, observed mass: 29818).
(b) Mass spectrum of EGFP containing 2 at residue 39 (calculated mass: 28831,
observed mass: 29832). The differences between the observed and calculated masses
are within the expected error range of the instrument.
To determine whether 2 can be utilized to photocontrol the protein splicing activity of the
Npu DnaE intein, we inserted a full-length Npu DnaE intein sequence into mCherry (Figure
2.7a). The residue 138 on a long loop between the β-strands 6 and 7 of mCherry was chosen
as the insertion site (Figure 2.2).[33] Moreover, the codon of the Cys1 residue of Npu DnaE
intein was mutated to an amber codon (TAG) for UAA incorporation. The chimeric
41
construct was subsequently expressed in HEK 293T cells, with cell culture media
containing 2. Almost no fluorescence was observed prior to UVA treatment (Figure 2.7b),
suggesting that the intein insertion disrupted the fluorescence of mCherry. Next, we used
a UVA lamp to directly illuminate cells in cell culture dishes, and strong red fluorescence
was observed in 1 h after irradiation (Figure 2.7b). This rate of developing red fluorescence
in cells was comparable to the rate of chromophore maturation of mCherry.[34] These
results indicate that the caged intein was photoactivated to undergo protein splicing and
form a highly fluorescent reconstituted mCherry. Since the construct was 6xHis-tagged at
the C-terminal end, Ni-NTA agarose beads were utilized to purify proteins from untreated
or UVA-treated cells. SDS-PAGE analysis of the proteins confirmed the highly efficient,
light-induced protein splicing: upon UVA-treatment, nearly all of the chimeric protein was
converted to the spliced product (Figure 2.7c).
42
Figure 2.7. Photoactivation of mCherry. (a) Primary structures of the
intein/mCherry chimeric protein and its photo-converted product after UV-induced
protein splicing. The red portion of the bar represents the mCherry sequence. The
asterisk (*) represents the Cys1 residue for UAA incorporation. The “CM” region
are two extein residues (+1 and +2). (b) Microscopic imaging of HEK 293T cells
expressing the construct treated with or without UV irradiation (scale bar: 50 μm).
(c) SDS-PAGE analysis of the Ni-NTA-purified proteins from HEK 293T cells, with
or without UV irradiation.
We next explored the use the photocaged intein in controlling enzymatic activities. We
inserted the photocaged intein into the catalytic domain of Src, a human tyrosine kinase.
The kinase catalytic domain has eight cysteine residues and 12 serine residues. We
designed chimeric proteins by randomly and individually inserting the intein into three sites
in Src (Figure 2.8a and Figure 2.3). First, we inserted the intein between Gly276 and
Cys277, or Val399 and Cys400 of Src (F1 and F2 in Figure 2.8a). For these two constructs,
protein splicing is expected to generate a product identical to the wildtype Src kinase
43
catalytic domain. We also built the third construct, F3, in which the intein was placed
downstream of Met341 (Figure 2.8a). Because the Npu DnaE intein requires a cysteine
residue at the +1 site for efficient protein splicing,[12] we also mutated Ser342 to cysteine,
to which appended was the native Src sequence from residue 343 to residue 533. The
splicing product of F3 is expected to be different from the wild-type protein by a single
Ser342Cys mutation. It is worth noting that a serine-to-cysteine mutant is tolerated in many
cases without dramatically affecting protein activities.[36] We also fused mCherry at the C-
terminal end as an expression indicator of the UAA-containing full-length proteins. Next,
we used a KRas-Src sensor,[37] based on Forster resonance energy transfer (FRET) between
ECFP and YPet, to evaluate the activities of F1, F2, and F3 in the presence or absence of
UVA irradiation. This sensor was well-validated in previous studies, and Src kinase
activity is known to decrease the intensity ratio (YPet/ECFP) of the sensitized YPet
fluorescence emission to the direct ECFP donor emission.[37] HEK 293T cells containing
each of the 3 constructs and the Ras-Src sensor were treated with UVA light and, then,
lysed for fluorescence quantification with a plate reader (Figure 2.8b). All of our three
constructs were inactive prior to UVA irradiation, while UVA light was able to activate
them, leading to the decrease of the FRET ratios of the sensor. A reduced FRET ratio was
also observed for cells co-expressing a wild-type Src kinase and the Src sensor.
Furthermore, negative control experiments were performed with HEK 293T cells
containing each of the three constructs but cultured in the absence of 2. Cells in the negative
groups were also subjected to the identical UVA treatment, so that the partial
photobleaching of the Src sensor did not mask the FRET changes caused by the
44
photoactivation of the Src kinase activity. Moreover, we utilized fluorescence microscopy
to closely monitor the process (Figure 2.8c). HEK 293T cells coexpressing the Src sensor
and the chimeric F1 construct were irradiated on an epi-fluorescence microscope equipped
with a DAPI excitation filter. Next, we carried out time-lapse, two-channel FRET imaging
of ECFP and YPet. The FRET ratios of the Src sensor gradually decreased in the monitored
30 min period. In contrast, the UVA-treated control cells cultured in the absence of 2
showed no obvious change in FRET ratios during the imaging period (Figure 2.8d and
Figure 2.9). It was noted that considerable Src-induced FRET changes occurred during the
2 min of UVA illumination. Analysis of single cells showed that the average FRET ratio
(YPet/ECFP) at 0 min, when time-lapse FRET imaging started, was 2.11 ± 0.08 for cells
containing the photo-activated Src. In comparison, negative cells identically treated with
UVA radiation had an average FRET ratio of 2.35 ± 0.03. This is not surprising,
considering the fast kinetics of the Npu DnaE intein. The UVA illumination condition did
not affect cell viability[38] but effectively activated the photocaged intein to promote the
formation of Src via protein splicing. These data support that the photocaged Npu DnaE
intein is an effective tool for the control of enzyme activities.
UV radiation may also decage the charged unnatural aminoacyl tRNA, which may be
further utilized by cellular ribosomes to synthesize proteins. We added cycloheximide (100
μg/mL) to block ribosomal protein synthesis during and after irradiation, the
photoactivation of Src kinase was not affected (Figure 2.8b). In addition, the activation of
Src was observed right after UV irradiation (Figure 2.8d), when ribosomal protein
45
synthesis from the decaged aminoacyl tRNA was unlikely to be achieved in this short time
frame. These results suggest that the direct decaging of the accumulated chimeric proteins
in cells was the major pathway in our experiments.
46
Figure 2.8. Photoactivation of Src kinase. (a) Primary structures of the chimeric
proteins tested in this study. The gray portion of the bars represents the sequence of
the human Src kinase between the indicated residues. The asterisk (*) indicates the
Cys1 residue for UAA incorporation; “M” is methionine, as the translational start
site; and “C” is cysteine, used to replace residue 342 of Src. (b) Activity of the
chimeric proteins before and after UVA irradiation, as measured from FRET ratios
of a KRas-Src sensor. In the absence of 2, the full length proteins were not
synthesized and are thus used as negative controls. A wild-type Src was also
prepared as a positive control. To block ribosomal protein synthesis during and
after UVA irradiation, cycloheximide (CHX) was also added to a control group. (c)
Pseudo-colored ratio images of representative UVA-treated HEK 293T cells
expressing the F1 construct in the presence of 2 at the indicated post-treatment time
(in minutes). The color bar represents fluorescence ratio (YPet/ECFP) (scale bar:
25 μm). (d) FRET ratios plotted versus time for HEK 293T cells. Color symbols are
for individual cells in panel c, marked at 0 min by arrows in the same colors. The
FRET ratios of an identically treated control cell cultured in the absence of 2 (see
Figure 2.9) are shown as open black circles.
47
Figure 2.9. Pseudocolored ratio FRET images of representative UVA-treated HEK
293T cells harboring the F1 construct, but cultured in the absence of 2 at the
indicated posttreatment time (in minutes). The color scale indicates the fluorescence
ratio (YPet/ECFP), and the scale bar is 20 µm.
2.4 Conclusions
In summary, we have engineered the first genetically encoded photoactivatable intein
compatible with living mammalian cells, in which a photocaged cysteine is used to
genetically replace the Cys1 residue of a highly efficient Npu DnaE intein. By
incorporating the photo-caging group, the protein splicing activity of the intein was
effectively and efficiently inhibited, and the activity was only observed after a brief
exposure to long wavelength UVA light. The resulting photocaged intein was inserted into
other proteins to directly control their primary structures. Because the Npu DnaE intein is
48
compatible with a myriad of extein sequences, such manipulation should be quite versatile.
A downstream C-extein Cys+1 residue is required for protein splicing, but cysteine can be
found in many proteins. In addition, a single cysteine mutation may be tolerated by many
proteins. Thus, the approach described here may be applied to a large percentage of
proteins. We acknowledge that additional N- and C-terminal extein sequences might affect
the kinetics of protein splicing. This issue can be addressed by using evolved inteins that
splice with higher efficiency at various splice junctions.[39] One might also prepare several
chimeric constructs at different splice sites to screen for variants retaining excellent
expression, stability, and post-photoactivation splicing kinetics. The use of the
photoactivatable inteins to control protein activity is highly attractive, because it requires
little information on the biochemistry or 3D structures of the proteins of interest. The
photoactivatable intein reported here is a new and powerful addition to the mammalian
opto-chemical genetic toolbox, permitting the modulation of proteins directly at the amino
acid sequence level.
49
References:
[1] Hirata R, Ohsumk Y, Nakano A, Kawasaki H, Suzuki K, Anraku Y. Molecular
structure of a gene, VMA1, encoding the catalytic subunit of H(+)- translocain adenosine
triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. Journal of
Biological Chemistry. 1990; 265(12):6726-33.
[2] Shah N, Muir T. Inteins: nature's gift to proein chemists. Chemical Science. 2014;
5(1):446-461.
[3] Topilina N, Mills K. Recent advances in in vivo applications of intein-mediated
protein splicing. Mobile DNA. 2014; 5(1):5.
[4] Mootz H. Split inteins as versatile tools for protein semisynthesis. Chembiochem.
2009; 10(16):2579-89.
[5] Peck S, Chen I, Liu D. Directed evolution of a small-molecule-triggered inein with
iproved splicing properies in mamalian cells. Chem. Biol. 2011; 18(5):619-30.
[6] Toettcher J, Voigt C, Weiner O, Lim W. The promise of optogenetics in cell
biology: interrogating molecular circuits in space and ime. Nat. Methods. 2011; 8(1):35-8.
[7] Cook S, Jack W, Xion X, Danley L, Ellman J, Schultz P, Noren C. Photochemically
initiated protein splicing. Angew. Chem., Int. Ed. 1995; 34:1629-1630.
[8] Vila-Perello M, Hori Y, Ribo M, Muir T. Activation of protein splicing by proease-
or light-triggered O to N acyl migration. Angew. Chem., Int. Ed. 2008; 47(40):7764-7.
[9] Berrade L, Kwon Y, Camarero J. Photomodulation of proein trans-splicing through
backbone photocaging of the DnaE split intein. Chembiochem. 2010; 11(10):1368-72.
50
[10] Binschik J, Zettler J, Mootz H. Photocontrol of protein activity mediated by the
cleavage reaction of a split intein. Angew. Chem., Int. Ed. 2011; 50(14):3249-52.
[11] Tyszkiewicz A, Muir T. Activation of protein splicing with light in yeast. Nat.
Methods 2008; 5(4):303-5.
[12] Zettler J, Schutz V, Mootz H. The naturally split Npu DnaE intein exhibits an
extraordinarily high rate in the protein trans-splicing reaction. FEBS Lett. 2009; 583(5):
909-14.
[13] Ellila S, Jurvansuu J, Iwai H. Evaluation and comparison of protein splicing by
exogenous inteins with foreign exteins in Escherichia coli. FEBS Lett. 2011;
585(21):3471-7.
[14] Cheriyan M, Pedamallu CS, Tori K, Perler F. Faser protein splicing with the Nostoc
punctiforme DnaE inein using non-native extein residues. J. Biol. Chem. 2013;
288(9):6202-11.
[15] Ramirez M, Valdes N, Guan D, Chen Z. Engineering split inein DnaE from Nosoc
punctiforme for rapid protein purification. Protein Eng. Des. Sel. 2013; 26(3), 215-23.
[16] Carvajal-Vallejos P, Pallisse R, Mootz HD, Schmidt S. Unprecedented rates and
efficiencies revealed for new natural split inteins from metagenomic sources. J. Biol. Chem.
2012; 287(34):28686-96.
[17] Wu N, Deiters A, Cropp TA, King D, Schultz P. A genetically encoded photocaged
amino acid. J. Am. Chem. Soc. 2004; 126(44):14306-7.
51
[18] Chen P, Groff D, Guo J, Ou W, Cellitti S, Geierstanger BH, Schultz P. A facile
system for encoding unnatural amino acids in mammalian cells. Angew. Chem., Int. Ed.
2009; 48(22):4052-5.
[19] Lemke E, Summerer D, Geierstanger B, Brittain S, Schultz P. Control of protein
phosphorylation with a genetically encoded photocaged amino acid. Nat. Chem. Biol. 2007;
3(12):769-72.
[20] Liu CC, Schultz P. Adding new chemistries to the genetic code. Annu. Rev.
Biochem. 2010; 79:413-44.
[21] Deiters A, Groff D, Ryu Y,Xie J, Schultz P. A genetically encoded photocaged
tyrosine. Angew. Chem., Int. Ed. 2006; 45(17):2728-31.
[22] Zhao J, Lin S, Huang Y, Zhao J, Chen PR. Mechanism-based design of a
photoactivatable firefly luciferase. J. Am. Chem. Soc. 2013: 135(20):7410-3.
[23] Gautier A, Deiters A, Chin JW. Light-activated kinases enable temporal dissection
of signaling networks in living cells. J. Am. Chem. Soc. 2011; 133(7):2124-7.
[24] Groff D, Wang F, Jockusch S, Turro NJ, Schultz P. A new strategy to photoactivate
green fluorescent protein. Angew. Chem., Int. Ed. 2010; 49(42):7677-9.
[25] Kang JY, Kawaguchi D, Coin I, Xiang Z, O ’ Leary DD, Slesinger PA, Wang L. In
vivo expression of a light-activatable potassium channel using unnatural amino acids.
Neuron. 2013; 80(2):358-70.
[26] Hemphill J, Chou C, Chin JW, Deiters A. Genetically encoded light-activated
transcription for spatiotemporal control of gene expression and gene silencing in
mammalian cells. J. Am. Chem. Soc. 2013; 135(36):13433-9.
52
[27] Gautier A, Nguyen DP, Lusic H, An W, Deiters A, Chin JW. Genetically encoded
photocontrol of protein localization in mammalian cells. J. Am. Chem. Soc. 2010;
132(12):4086-8.
[28] Baker AS, Deiters A. Optical control of protein function through unnatural amino
acid mutagenesis and other optogenetic approaches. ACS Chem. Biol. 2014; 9(7):1398-407.
[29] Nguyen DP, Mahesh M, Elsasser SJ, Hancock SM, Uttamapinant C, Chin JW.
Genetic encoding of photocaged cysteine allows photoactivation of TEV protease in live
mammalian cells. J. Am. Chem. Soc. 2014; 136(6):2240-3.
[30] Uprety R, Luo J, Liu J, Naro Y, Samanta S, Deiters A. Genetic encoding of caged
cysteine and caged homocysteine in bacterial and mammalian cells. ChemBioChem. 2014:
15(12):1793-9.
[31] Chen S, Chen ZJ, Ren W, Ai HW. Reaction-based genetically encoded fluorescent
hydrogen sulfide sensors. J. Am. Chem. Soc. 2012; 134(23):9589-92.
[32] Bochet CG. Photolabile protecting groups and linkers. J. Chem. Soc., Perkin Trans.
1 2002; 125-142.
[33] Li Y, Sierra AM, Ai HW, Campbell RE. Identification of sites within a monomeric
red fluorescent protein that tolerate peptide insertion and testing of corresponding circular
permutations. Photochem. Photobiol. 2008; 84(1):111-9.
[34] Macdonald PJ, Chen Y, Mueller JD. Chromophore maturation and fluorescence
fluctuation spectroscopy of fluorescent proteins in a cell-free expression system. Anal.
Biochem. 2012; 421(1):291-8.
53
[35] Johannessen CM, Boehm JS, Kim SY, Thomas SR., Wardwell L, Johnson LA,
Emery CM, Stransky N, Cogdill AP, Barretina J, Caponigro G, Hieronymus H, Murray
RR, Salehi-Ashtiani K, Hill DE, Vidal M, Zhao JJ, Yang X, Alkan O, Kim S, Harris JL,
Wilson CJ, Myer VE, Finan PM, Root DE, Roberts TM, Golub T, Flaherty KT, Dummer
R, Weber BL, Sellers WR, Schlegel R, Wargo JA, Hahn WC, Garraway LA. COT drives
resistance to RAF inhibition through MAP kinase pathway reactivation. Nature. 2010;
468(7326):968-72.
[36] Wang X, Pineau C, Gu S, Guschinskaya N, Pickersgill RW, Shevchik VE. Cysteine
scanning mutagenesis and disulfide mapping analysis of arrangement of GspC and GspD
protomers within the type 2 secretion system. J. Biol. Chem. 2012; 287(23): 19082-93.
[37] Seong J, Lu S, Ouyang M, Huang H, Zhang J, Frame MC, Wang Y. Visualization
of Src activity at different compartments of the plasma membrane by FRET imaging. Chem.
Biol. 2009; 16(1):48-57.
[38] Hemphill J, Govan J, Uprety R, Tsang M, Deiters A. Site-specific promoter caging
enables optochemical gene activation in cells and animals. J. Am. Chem. Soc. 2014;
136(19):7152-8.
[39] Lockless SW, Muir TW. Traceless protein splicing utilizing evolved split inteins.
Proc. Natl. Acad. Sci. U.S.A. 2009; 106(27):10999-1004.
54
Chapter 3: Expanding the Genetic Code for a
Dinitrophenyl Hapten
3.1 Introduction
Haptens are small molecules that induce strong immune responses when attached to
proteins or peptides.[1] Although they cannot trigger immune responses alone, these small
moieties contain antigenic determinants that can bind to pre-existing antibodies.[1] Due to
their high affinity and specificity, antibody-hapten interactions have been exploited for
diverse applications, such as affinity chromatography, immunohistochemistry, in situ
hybridization, and enzyme-linked immunoassay (ELISA).[2-4] DNP is one of the most
common haptens.[4-5] Polyclonal and monoclonal anti-DNP antibodies, as well as single
chain variable fragments (scFv) against DNP, are readily accessible reagents.[6] Therefore,
the ability to introduce DNP into proteins is important for the applications of DNP and
anti-DNP antibodies in separation and detection (Fig. 3.1).[4, 7-8] Moreover, DNP-
containing proteins and peptides can induce immunological hypersensitivity, and they have
been commonly used to probe the biology of immune systems.[9-12] In addition, because
about one percent of the circulating human antibodies can naturally bind to DNP[13-14], DNP
has been utilized to label disease-causing cancer cells and bacterial cells to initiate
antibody-mediated immune responses and trigger cytotoxicity and phagocytosis.[15-16]
Furthermore, self-antigens or weakly immunogenic antigens may be modified with DNP
to break the immune tolerance of the hosts and generate antibodies that are cross-reactive
55
to the self or weak antigens.[17-18] This immunotherapy strategy seems to be quite promising
for a variety of human diseases.[19]
Figure 3.1. Applications of DNP-labeled proteins.
Despite the potential of broad applications, the current methods for preparing DNP-labeled
proteins and peptides have significant limitations. For example, standard solid phase
peptide synthesis can only produce short DNP-containing peptides, whereas protein
56
labeling via reactive amino acid residues (e.g. cysteine and lysine) often lacks site-
specificity.[21] Expanding the genetic code of living cells and organisms is a popular
method for preparing proteins containing unnatural functional groups.[22-23] This method
has now enabled the site-specific incorporation of > 100 UAAs containing diverse side-
chain functional groups into biosynthesized proteins, but the genetic encoding of DNP-
containing UAAs has not yet been achieved. Herein, we describe our recent effort in
genetically encoding N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK, Scheme 3.1
compound 3) for the biological preparation of proteins containing site-specific DNP.
3.2 Materials and Methods
3.2.1 Chemical Synthesis of N6-(2-(2,4-dinitrophenyl)acetyl)lysine
(DnpK, 3)
Scheme 3.1. Synthetic route to prepare DnpK.
57
All chemicals were purchased from Sigma-Aldrich (St. Louis, MO) or Fisher Scientific
(Waltham, MA). N,N'-Dicyclohexylcarbodiimide (DCC, 1.13 g, 5.5 mmol) and N-
hydroxysuccinimide (NHS, 575 mg, 5 mmol) were added into 2,4-dinitrophenylacetic
acid (1, 1.13 g, 5 mmol) dissolved in CH2Cl2 (30 mL). The mixture was stirred at
room temperature for 18 h, followed by gravity filtration. Next, the filtrate was
concentrated in vacuo, and the residue was re-dissolved in THF (5 mL) and introduced
into an aqueous solution (30 mL) of Nα-(tert-butoxycarbonyl)-L-lysine (Boc-Lys-OH)
(1.23 g, 5 mmol) and NaHCO3 (840 mg, 10 mmol). The resulting mixture was stirred
at room temperature overnight, acidified with dilute HCl (1 M, 10 mL), and extracted
with ethyl acetate (20 mL) three times. Organic layers were combined and
concentrated in vacuo to yield a crude product, which was further purified using silica
gel column chromatography (EtOAc/Hexane = 3:1) to derive 2 as light yellow oil (1.18
g, 2.6 mmol). The overall yield was 52%.
58
3.2.2 Chemical Synthesis of N6-(2-(2-nitrophenyl)acetyl)lysine
(2-NPK) and N6-(2-(4- nitrophenyl)acetyl)lysine (4-NPK)
Scheme 3.2. Synthetic route to prepare 2-NPK and 4-NPK.
2-(2-Nitrophenyl)-acetic acid or 2-(4-nitrophenyl)-acetic acid (1 mmol, 181 mg) was
dissolved in CH2Cl2 (10 mL) on an ice-water bath. Next, NHS (1 mmol, 115 mg) and
DCC (1.1 mmol, 226 mg) were added. The mixture was stirred at room temperature
for 8 h, followed by gravity filtration. Next, the filtrate was concentrated in vacuo, and
the residue was re-dissolved in THF (5 mL) and introduced into an aqueous solution
(30 mL) of Boc-Lys-OH (1 mmol, 246 mg) and Na2CO3 (1 mmol, 106 mg). The
59
resulting solution was stirred at room temperature overnight, acidified with dilute HCl
(0.5 N, 4 mL), and extracted with EtOAc (10 mL) three times. Organic layers were
combined, dried over Na2SO4, and concentrated in vacuo to yield a crude product,
which was further purified using silica gel column chromatography (EtOAc/Hexane =
9:1) to yield light yellow solid (0.58 mmol, 240 mg). Next, TFA/ CH2Cl2 (1:2) was
added to remove the protection group to afford the final product. The overall yields
were 58% and 70% for 2-NPK and 4-NPK, respectively.
3.2.3 Evolution of a Mutant Aminoacyl-tRNA Synthetase
We followed a previous procedure[28] to construct an MbPylRS active site library, based on
overlap extension PCR with synthetic degenerate oligo-nucleotides (Integrated DNA
Technologies). The library was inserted into a pBK plasmid. pRep-tRNAPyl and pNeg-
tRNAPyl plasmids were used for positive and negative selection, respectively.[28] During
positive selection, the pBK-PylRS plasmids encoding the MbPylRS library were used to
transform E. coli DH10B competent cells harboring pRep-tRNAPyl Cells were plated on
LB agar plates containing tetracycline (Tet; 25 mg/mL), kanamycin (Kan; 50 mg/mL),
chloramphenicol (Cm; 70 mg/mL), and DnpK (1 mM) and were incubated at 378C for 48
h. Colonies on the plates were pooled, and total plasmids were mini-prepped. pBK-PylRS
plasmids were separated from pRep-tRNAPyl by agarose gel electrophoresis. Extracted
pBK-PylRS plasmids from the positive selection were introduced into DH10B containing
pNeg-tRNAPyl Cells were next plated on LB agar containing 50 mg/mL Kan, 100 mg/mL
ampicillin (Amp), and 0.2% L-arabinose. Plates were incubated at 37˚C for 16 hour. Cells
60
were pooled, and the pBK-PylRS plasmids were again separated and extracted. After two
alternative rounds of positive and negative selection, the mbPylRS mutants were subjected
to the third round of positive selection. To further validate survival clones from the third
positive selection, individual pBK-MbPylRS mutants were prepared and used to co-
transform DH10B electro-competent cells containing another plasmid, pBAD-sfGFP
Y39TAG. Fluorescence intensities of bacterial cells, in the presence or absence of 1 mm
DnpK, were quantified. The mutant leading to the largest fluorescence intensity difference
under the two conditions was named DnpKRS.
3.2.4 Computational Modeling of the DnpK/DnpKRS Complex
Structure
The mutant protein structure was modeled with SWISS-MODEL[33], based on the Protein
Data Bank (PDB) structure 2Q7H.[34] The ligand was edited in PyMOL.[35] The complex
structure was energy-minimized by using the YASARA energy-minimization server.
3.2.5 Protein Expression and Purification from E. coli
The gene in pBK-DnpKRS was amplified by PCR and inserted into a new pEAH plasmid
(KanR), which contains a tRNAPyl expression gene cassette driven by a proK promoter and
a synthetase expression gene cassette driven by a pBAD promoter. A pBAD plasmid
(AmpR) encoding sfGFP-Y39TAG, T4L-K65TAG, or Z-domain-K7TAG was used to co-
transform DH10B or a nfsA/nfsB double-deletion K12 strain[29], along with the pEAH-
DnpK plasmid. A single colony was used to inoculate 2YT medium [100 mL, containing
61
L-arabinose (0.2 %), ampicillin (100 mg/mL), and kanamycin (50 mg/mL)] in the presence
or absence of DnpK (1 mm) at 30˚C for 24 hour. Cells were harvested by centrifugation
and lysed with B-PER II protein extraction reagent (Pierce). His 6-tagged protein was
purified with Ni-NTA agarose beads (Qiagen) under native conditions according to the
manufacturer’s instructions.
3.2.6 Protein Expression and Purification from HEK293T Cells
The mammalian expression vector pCMV-DnpK was created by replacing the synthetase
in a previous pCMV-AbK plasmid.[37] This plasmid also contains a copy of the tRNA Pyl
gene under the control of a human U6 promoter. HEK293T cells were grown in DMEM
supplemented with 10% fetal bovine serum (FBS). Cells at 70% confluency were
transfected with mixtures of the corresponding plasmids by using linear polyethylenimine
(PEI, M W =25000). The culture medium was further supplemented with DnpK (1 mm) as
appropriate. When expressing EGFP in HEK293T cells, pCMV-DnpK (12 mg) and
pEGFP-Y39TAG (12 mg) were mixed with PEI (60 mg) to transfect cells in 100 mm
diameter cell culture dishes. Cells were harvested 72 hour after transfection, washed with
PBS (3 × 8 mL), and then collected and lysed with radio-immunoprecipitation assay
(RIPA) buffer on ice for 10 min. Lysates were cleared with a benchtop centrifuge at 5000g
for 2 min and were used directly for western blotting or purified by Ni-NTA agarose beads
(Qiagen).
62
3.2.7 Protein Electrospray Mass Spectrometry
Proteins were precipitated with methanol/chloroform and dissolved in formic acid/water
(1:100) solution for mass spectrometry characterization. Mass spectra were recorded on an
Agilent ESI-TOF instrument by direct infusion of proteins. Observed spectra were de-
convoluted to derive protein masses by using the Agilent LC/MSD Deconvolution package
provided with the instrument. The instrument detects protein masses within an expected
mass error of ±0.01%.
3.2.8 Western Blotting
PVDF membranes with blotted proteins were first blocked with 1% BSA for 1 h and then
incubated with HRP-conjugated anti-DNP antibody (cat. no. FP1129, PerkinElmer) in
1/500 dilution at 4˚C for 14 hour. A colorimetric One-Component TMB Membrane
Peroxidase Substrate (cat. no. 50–77–18, Kirkegaard & Perry Laboratories, Gaithersburg,
MD) was used to directly visualize the immobilized antibody.
3.3 Results
The amino acid DnpK was prepared from Nα-(tert-Butoxycarbonyl)-L-lysine (Boc-Lys-
OH) and 2,4-Dinitrophenylacetic acid in 52% overall yield in three steps. Proteins were
expressed in the presence or absence of 1 mM DnpK in E. coli cells containing (Fig. 3.2A).
Previous studies have genetically encoded a large number of lysine-derived UAAs using
mutants of pyrrolysyl-tRNA synthetase/pyrrolysyl tRNA (PylRS/tRNAPyl) pairs. Along
this line, we screened a M. barkeri PylRS (mbPylRS) library with complete randomization
63
at residues L270, Y271, L274, and C313 (and an additional Y349F mutation to enhance
tRNA aminoacylation[24]) for the capability of suppressing amber (TAG) codons in the
presence of DnpK. We performed multiple cycles of positive and negative selections in E.
coli strain DH10B, as previously described.[28] We identified an mbPylRS mutant with
Y271M, L274T, C313A, and Y349F mutations (DnpKRS) that survived in the third round
of positive selection. These mutated residues form an enlarged cavity to accommodate the
nonnative DNP functional group, as shown in a modeled structure of the DnpK/DnpKRS
complex (Fig. 3.2B).
64
Figure 3.2. (A) Chemical Structure of N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK).
(B) Computationally modeled structure of DnpKRS bound with DnpK. (C) SDS-
PAGE of Ni-NTA purified sfGFP. Proteins were expressed in the presence or
absence of 1 mM DnpK in E. coli cells containing tRNA. (D) ESI-MS analysis of the
intact sfGFP protein expressed in E. coli in the presence of DnpK.
We next introduced the genes for DnpKRS, the corresponding suppressor tRNA, and
sfGFP-Y39TAG (His6-tagged superfolder GFP containing a TAG codon for residue 39)
into DH10B E. coli cells. The full-length protein was produced in good yield in the
presence of 1 mM DnpK (4.4±1.5 mg per liter of culture), while full-length sfGFP was not
65
observed in the absence of DnpK (Fig. 3.2C). The resulting protein was characterized by
direct-infusion electrospray ionization mass spectrometry (ESI-MS). To our surprise, the
observed molecular mass did not match the molecular mass of sfGFP containing a DnpKRS
residue (Fig. 3.2D). Our spectrometer has a mass accuracy of 0.01%. The difference of the
expected and observed molar masses (31 Da) indicates that the nitro group(s) of the DnpK
residue was likely reduced in E. coli, although the exact chemical form of the reduced
species could not be determined from this MS experiment. To investigate whether the
problem was protein-specific, we also expressed T4 lysozyme and the Staphylococcal
protein A (SpA) Z-domain, each containing a TAG codon. The mismatch between the
expected and observed molecular masses still existed (Figure 3.3). The observation of
multiple reduction states for the small Z-domain protein further supports our assumption
that bacterial nitroreductases were problematic for expressing DnpK-containing proteins.
We next utilized a special E. coli strain[29], in which the nfsA and nfsB nitroreductase genes
were double deleted, to express sfGFP and the Z-domain. Unfortunately, this new strain
did not solve our problem (Figure 3.3), likely due to the presence of other nitroreductases
in E. coli. To explore which of the two nitro groups in DnpK is more amenable to reduction
and which state they were reduced to, we further synthesized two compounds containing a
single nitro group, N6-(2-(2-nitrophenyl)acetyl)lysine (2-NPK) and N6-(2-(4-
nitrophenyl)acetyl)lysine (4-NPK; Scheme 3.2). When either 2-NPK or 4-NPK was added
to the medium to culture DH10B cells containing DnpKRS, the suppressor tRNA, and
sfGFP-Y39TAG, full-length sfGFP was produced. ESI-MS analysis showed that the nitro
group at the para position of 4-NPK, but not the one at the ortho position of 2-NPK, was
66
reduced to amine (Figure 3.4). These results strongly suggest that the MS peaks showing
~30 Da shifts for the abovementioned sfGFP, T4 lysozyme, and Z-domain proteins (Figure
3.3) were likely due to reduction of the para nitro of DnpK to an amine. It is worthwhile to
note that the Z-domain protein, prepared from either DH10B or the nfsA/nfsB double-
deletion strain, also showed minor MS peaks corresponding to the unreduced protein or the
protein with both nitro groups reduced to amines (Figure 3.3D and 3.3E). This indicates
that, although the paranitro is preferable for reduction, the ortho nitro of DnpK might also
be reduced; the extent of reduction appears to be dependent on protein context. As DnpKRS
is a promiscuous enzyme that uses different amino acid substrates, we also attempted to
investigate whether the nitro groups were reduced before or after being incorporated into
proteins. We incubated the free amino acid, DnpK (1 mM), with DH10B cells for 20 hour
before analyzing the cell lysate with ESI-MS. We did not observe any peak corresponding
to the reduced forms of DnpK (Figure 3.5). Despite the possibility that our analytical
method has different sensitivity toward DnpK and its reduced forms, this result supports
that a large portion of the DnpK amino acid was intact in E. coli. Moreover, the distribution
of reduced forms of proteins seemed to be protein-dependent (Figure 3.4). We later
confirmed the activity of DnpKRS toward DnpK in mammalian cells, so we deduce that
the reduction proceeds after DnpK is incorporated into proteins. However, we cannot rule
out the possibility that reduced DnpK exists in E. coli but that we could not detect it owing
to its low abundance; reduced DnpK could be utilized by the promiscuous DnpKRS to also
form proteins containing reduced DnpK. Therefore, we attempted to mutate the residues
inside the b-barrels of fluorescent proteins. We used two constructs, pBAD-hsGFP
67
(66TAG) and pBAD-mApple (72TAG)[31], along with a TAG suppression plasmid
expressing DnpKRS and the suppressor tRNA. Unfortunately, we were unable to prepare
any full-length, folded, UAA-containing proteins under both conditions, possibly due to
the steric hindrance of DnpK (or its potentially reduced forms), which destabilizes protein
folds. Further research is needed to clarify how bacterial nitro-reductases interact with
nitro-containing small molecules and proteins.
68
Figure 3.3. Mass spectrometry analysis of the indicated proteins purified from
DH10B or the nfsA/nfsB double deletion strain, suggesting a reduced DNP
group in these proteins.
69
Figure 3.4. Mass spectrometry analysis of the indicated proteins purified from
DH10B in the presence of (A) 2-NPK or (B) 4-NPK. The data suggest that the
para nitro of 4-NPK was reduced to an amine.
70
Figure 3.5. Direct ESI-MS analysis (positive mode) of the lysate of DH10B cells
incubated with 1 mM DnpK for 20 h, showing the [M+H]+ peak (355.12) for
DnpK but no peak for reduced forms of DnpK (expected: 325 and 295). Other
peaks in this figure were likely caused by additional molecules in the cell lysate.
Figure 3.6. SDS-PAGE and Western blot of DnpK-containing EGFP and the
wild-type EGFP, purified from HEK 293T cells.
71
Figure 3.7. (A) Fluorescence imaging of HEK 293T cells containing genes for
pEGFP-Tyr39TAG, DnpKRS, and the corresponding suppressor tRNA, in the
presence or absence of DnpK (1 mM). (B) ESI-MS analysis of the intact EGFP
protein expressed in HEK 293T in the presence of DnpK. (C) SDS-PAGE of HEK
293T cell lysates and the purified EGFP protein. (D) Anti-DNP Western blot of
HEK 293T cell lysates and the purified EGFP protein. A colorimetric detection
method was used to locate anti-DNP antibodies. Also shown were bands from a pre-
stained protein marker.
Mammalian cells typically have less nitroreductase activities than E. coli.[32] We used
human embryonic kidney (HEK) 293T cells to express DnpK-containing proteins. The
genes encoding DnpKRS, the corresponding suppressor tRNA, and EGFP-Y39TAG (His
72
6-tagged enhanced GFP containing a TAG codon for residue 39) were co-expressed in
HEK293T cells in culture medium supplemented with 1 mm DnpK. Suppression of the
amber codon was verified by observing strong green fluorescence for cells cultured with
DnpK but not for cells cultured in the absence of DnpK (Figure 3.7A). We purified the
protein from cells cultured with DnpK and analyzed it with direct infusion ESI-MS. The
observed and expected molecular masses were well-matched (Figure 3.7B), indicating that
DnpK was site-specifically incorporated into EGFP and not further reduced in HEK293T
cells. To test the use of anti-DNP antibodies to recognize DnpK-containing proteins, we
performed SDS polyacryl-amide gel electrophoresis (SDS-PAGE) and western blot
analysis of HEK293T cell lysates, in addition to the EGFP protein affinity-purified with
Ni-NTA agarose beads (Figure 3.7C and D). We observed district bands on the western
blot, resulting from the selective interaction between DnpK-containing EGFP and anti-
DNP antibodies. Both bands were clear for the lysate mixture derived from the cells
cultured with DnpK and the pure DnpK-containing EGFP protein. We also performed a
control experiment to ensure no interactions occurred between anti-DNP antibodies and
the wild-type EGFP (Figure 3.6). These results further support that the DNP hapten was
site-specifically introduced into proteins in living HEK293T cells.
3.4 Conclusions
In summary, we have engineered mbPylRS to genetically encode a small-molecule hapten
moiety. Although the DNP moiety was unstable in E. coli, we found that its stability was
73
enhanced in mammalian HEK293T cells. This small hapten moiety was able to induce
selective interactions with anti-DNP antibodies, as shown in our western blot experiments.
The capability of genetically introducing DNP into proteins is expected to find broad
applications in biosensing and bioseparation, immunology, and therapeutics.
74
References:
[1] Chipinda I, Hettick JM, Siegel PD. Haptenation: chemical reactivity and protein
binding. J. Allergy 2011, 839682.
[2] Chan CP, Cheung YC, Renneberg R, Seydack M. New trends in immunoassays.
Adv. Biochem. Eng. Biotechnol. 2008; 109:123–154.
[3] Mondal K, Gupta MN, Roy I. Affinity-based strategies for protein purification.
Anal. Chem. 2006; 78(11):3499–3504.
[4] Jasani B, Thomas ND, Navabi H, Millar DM, Newman GR, Gee J, Williams ED.
Dinitrophenyl (DNP) hapten sandwich staining (DHSS) procedure. A 10 year review of its
principle reagents and applications. J. Immunol. Methods. 1992; 150(1-2):193–198.
[5] Shreder K. Synthetic haptens as probes of antibody response and
immunorecognition. Methods. 2000; 20(3):372–379.
[6] Varga JM, Klein GF, Fritsch P. Binding of a mouse monoclonal IgE (anti-DNP)
antibody to radio-derivatized polystyrene-DNP complexes. FASEB J. 1990; 4(9):2678–
2683.
[7] PERRONE JC. Separation of amino-acids as dinitrophenyl derivatives. Nature
1951; 167(4248):513–515.
[8] Hawthorne SJ, Pagano M, Harriott P, Halton DW, Walker B. The synthesis and
utilization of 2,4-dinitrophenyl-labeled irreversible peptidyl diazomethyl ketone inhibitors.
Anal. Biochem. 1998; 261(2):131–138.
[9] Mallone R, Nepom GT. Targeting T lymphocytes for immune monitoring and
intervention in autoimmune diabetes. Am. J. Ther. 2005; 12(6):534–550.
75
[10] Nakamura K, Mimura Y, Tanaka T, Fujikura Y, Takeo K. Affinity maturation of
anti-hapten antibodies in a single mouse analyzed by two-dimensional affinity
electrophoresis. Electrophoresis. 1993; 14(12):1338–1340.
[11] Eisen HN, Chakraborty AK. Immunopaleontology reveals how affinity
enhancement is achieved during affinity maturation of antibodies to influenza virus. Proc.
Natl. Acad. Sci. USA 2013; 110(1):7–8.
[12] Manne J, Mastrangelo MJ, Sato T, Berd D. TCR rearrangement in lymphocytes
infiltrating melanoma metastases after administration of autologous dinitrophenyl-
modified vaccine. J. Immunol. 2002; 169(6):3407–3412.
[13] Farah FS. Natural antibodies specific to the 2,4-dinitrophenyl group. Immunology.
1973; 25(2):217–226.
[14] Karjalainen K, Mäkelä O. Concentrations of three hapten-binding
immunoglobulins in pooled normal human serum. Eur. J. Immunol. 1976; 6(2):88–93.
[15] McEnaney PJ, Parker CG, Zhang AX, Spiegel DA. Antibody-recruiting molecules:
an emerging paradigm for engaging immune function in treating human disease. ACS Chem.
Biol. 2012; 7(7), 1139–1151.
[16] Fura JM, Sabulski MJ, Pires MM. D-amino acid mediated recruitment of
endogenous antibodies to bacterial surfaces. ACS Chem. Biol. 2014; 9(7):1480–1489.
[17] Grünewald J, Tsao ML, Perera R, Dong L, Niessen F, Wen BG, Kubitz DM, Smider
VV, Ruf W, Nasoff M, Lerner RA, Schultz PG. Immunochemical termination of self-
tolerance. Proc. Natl. Acad. Sci. USA. 2008; 105(32):11276–11280.
76
[18] Grünewald J, Hunt GS, Dong L, Niessen F, Wen BG, Tsao ML, Perera R, Kang M,
Laffitte BA, Azarian S, Ruf W, Nasoff M, Lerner RA, Schultz PG, Smider VV.
Mechanistic studies of the immunochemical termination of self-tolerance with unnatural
amino acids. Proc. Natl. Acad. Sci. USA. 2009; 106(11):4337–4342.
[19] Erkes DA, Selvan SR. Hapten-induced contact hypersensitivity, autoimmune
reactions, and tumor regression: plausibility of mediating antitumor immunity. J. Immunol.
Res. 2014, 175265.
[20] Amblard M, Fehrentz JA, Martinez J, Subra G. Methods and protocols of modern
solid phase Peptide synthesis. Mol. Biotechnol. 2006; 33(3):239–254.
[21] Sletten EM, Bertozzi CR. Bioorthogonal chemistry: fishing for selectivity in a sea
of functionality. Angew. Chem. Int. Ed. 2009; 48(38):6974–6998; Angew. Chem. 2009, 121,
7108–7133.
[22] Ai HW. Biochemical analysis with the expanded genetic lexicon. Anal. Bioanal.
Chem. 2012; 403(8):2089–2102.
[23] Liu CC, Schultz PG. Adding new chemistries to the genetic code. Annu. Rev.
Biochem. 2010; 79:413–444.
[24] Yanagisawa T, Ishii R, Fukunaga R, Kobayashi T, Sakamoto K, Yokoyama S.
Multistep engineering of pyrrolysyl-tRNA synthetase to genetically encode N(epsilon)-(o-
azidobenzyloxycarbonyl) lysine for site-specific protein modification. Chem. Biol. 2008;
15(11), 1187–1197.
[25] Ai HW, Lee JW, Schultz PG. A method to site-specifically introduce methyllysine
into proteins in E. coli. Chem. Commun. 2010; 46(30):5506–5508.
77
[26] Wang YS1, Russell WK, Wang Z, Wan W, Dodd LE, Pai PJ, Russell DH, Liu WR.
The de novo engineering of pyrrolysyl-tRNA synthetase for genetic incorporation of L-
phenylalanine and its derivatives. Mol. BioSyst. 2011; 7(3):714–717.
[27] Arbely E, Torres-Kolbus J, Deiters A, Chin JW. Photocontrol of tyrosine
phosphorylation in mammalian cells via genetic encoding of photocaged tyrosine. J. Am.
Chem. Soc. 2012; 134(29):11912–11915.
[28] Chen PR, Groff D, Guo J, Ou W, Cellitti S, Geierstanger BH, Schultz PG. A facile
system for encoding unnatural amino acids in mammalian cells. Angew. Chem. Int. Ed.
2009; 48(22):4052–4055; Angew. Chem. 2009, 121, 4112–4115.
[29] Valle A, Le Borgne S, Bolívar J, Cabrera G, Cantero D. Study of the role played
by NfsA, NfsB nitroreductase and NemA flavin reductase from Escherichia coli in the
conversion of ethyl 2-(2'-nitrophenoxy)acetate to 4-hydroxy-(2H)-1,4-benzoxazin-3(4H)-
one (D-DIBOA), a benzohydroxamic acid with interesting biological properties. Appl.
Microbiol. Biotechnol. 2012; 94(1):163–171.
[30] Chen ZJ, Ai HW. A highly responsive and selective fluorescent probe for imaging
physiological hydrogen sulfide. Biochemistry. 2014; 53(37):5966–5974.
[31] Shaner NC, Campbell RE, Steinbach PA, Giepmans BN, Palmer AE, Tsien RY.
Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma
sp. red fluorescent protein. Nat. Biotechnol. 2004; 22(12):1567–1572.
[32] Vass SO, Jarrom D, Wilson WR, Hyde EI, Searle PF. E. coli NfsA: an alternative
nitroreductase for prodrug activation gene therapy in combination with CB1954. Br. J.
Cancer. 2009; 100(12):1903–1911.
78
[33] Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F,
Gallo Cassarino T, Bertoni M, Bordoli L, Schwede T. SWISS-MODEL: modelling protein
tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014,
42, W252–W258.
[34] Kavran JM1, Gundllapalli S, O'Donoghue P, Englert M, Söll D, Steitz TA.
Structure of pyrrolysyl-tRNA synthetase, an archaeal enzyme for genetic code innovation.
Proc. Natl. Acad. Sci. USA 2007; 104(27):11268–11273.
[35] DeLano WL. The PyMOL User’s Manual, DeLano Scientific, San Carlos,
USA, 2002.
[36] Krieger E1, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus
K. Improving physical realism, stereochemistry, and side-chain accuracy in homology
modeling: Four approaches that performed well in CASP8. Proteins Struct. Funct. Bioinf.
2009, 77, Suppl. 9, 114–122.
[37] Ai HW, Shen W, Sagi A, Chen PR, Schultz PG. Probing protein-protein
interactions with a genetically encoded photo-crosslinking amino acid. ChemBioChem
2011; 12(12):1854–1857.
79
Chapter 4: Study of the Binding Energies
between Unnatural Amino Acids and
Engineered Orthogonal Tyrosyl-tRNA
Synthetases
4.1 Introduction
In most organisms, 61 trinucleotide codons encode for the 20 canonical amino acids.[1] An
additional three codons (UAA, UAG, and UGA) are nonsense “stop” codons that trigger
the termination of ribosomal protein synthesis.[2] Proteins can undergo posttranslational
modifications to further derivatize the canonical 20 amino acids, thereby increasing
complexity and biological control to achieve sophisticated cellular functions.[3] In the
1960s, nonsense suppression bacterial strains were discovered to express additional tRNAs
that recognize the stop codons UAG (amber suppressor), UAA (ochre suppressor), or UGA
(opal suppressor).[1,4] There are also exceptional examples in which uncommon amino
acids, such as pyrrolysine or selenocysteine, can be prompted for ribosomal peptide
synthesis in response to UAG or UGA codons, respectively.[5-7] Furthermore, in the past
few decades, research has expanded the possibilities of protein structures and functions
through an expansion of the genetic codes.[8, 9] Additional pairs of aminoacyl-tRNA
synthetases (aaRSes) and the corresponding tRNAs that do not cross-react with
endogenous tRNAs, aaRSes, and amino acids, have been engineered and expressed in
80
living bacteria, yeast, mammalian cells and several model multicellular organisms.[8-11]
These aaRSes and tRNAs have been pre-engineered to use unnatural amino acids (unAAs)
as their substrates, thereby affording the genetic encoding of unAAs in living cells and
organisms. This genetic code expansion technology has since produced modified proteins
with site-specific incorporation of a large array of unAAs[8], such as fluorescent amino
acids[12, 13], biophysical probes[14-16], photocrosslinkers[17-19], reactive chemical handles for
bioorthogonal reactions[20-22], photocaged amino acids[23-28], and amino acids identical to
or mimicking Post-translational modifications[29-32]. This method has now been broadly
utilized not only to create proteins with enhanced and novel properties, but also to develop
novel therapeutics and investigate protein structure and function.[8]
Engineered orthogonal tRNAs are typically adapted into an organism of interest from
evolutionarily distant species to lower the likelihood of cross-recognition by endogenous
aaRSes.[8] Even so, the corresponding aaRSes, with some exceptions involving pyrrolysyl-
tRNA synthetases[33], would still have to undergo extensive protein engineering to switch
their substrate preference from a native amino acid to a unAA. This bioengineering
procedure, which typically involves several rounds of positive and negative selection, is
laborious and time-consuming, and requires considerable expertise[8]. The number of
residues that can be simultaneously mutated is often limited to 5 or 6, due to technical
limitations with the molecular biology[34]. Hence, it is often not trivial to derive orthogonal
aaRSes for unAA substrates that are very different from the enzymes’ native substrates.
New strategies are currently being explored to circumvent this challenging positive and
81
negative selection process. For example, some existing orthogonal aaRSes can use several
different unAAs as their substrates, so their polyspecificity has been exploited for the
genetic encoding of new unAAs.[34-36] This substrate promiscuity is not problematic,
however, because the experimenter controls the unAA(s) supplemented into the culture
medium for any given experiment.
It may also be prudent to computationally design aaRSes for unAAs because computational
methods have been routinely utilized to study enzyme-substrate interactions.[37] Previously,
Wang et al. and Datta et al. computationally estimated the binding energies of natural E.
coli phenylalanyl- and methionyl-tRNA synthetases to several unnatural phenylalanine or
methionine analogues, respectively, and compared these values to the experimental
activities of these enzyme-substrate pairs.[38-39] Using this method, they were able to
genetically incorporate several unAAs into proteins using auxotrophic E. coli, but could
not achieve site-specificity due to cross-reactivity issues. Computational studies on
engineered orthogonal aaRSes and unAAs are scarce. In a previous work, Zhang et al.
reported a clash opportunity progressive (COP) method to identify a possible mutant of M.
jannaschii tyrosyl-tRNA synthetase (MjTyrRS) for preferential binding to O-methyl-L-
tyrosine over Tyr.[40] In another work, Sun et al. docked p-acetyl-L-phenylalanine (AcF)
into 60 different MjTyrRS mutants to identify possible mutations benefiting enzyme-
substrate binding.[41] The full capability of the methods by Zhang et al. and Sun et al. in
designing orthogonal aaRSes for unAAs is still unclear. Each study focused on only one
unAA, and the mutant aaRSes derived from their computational studies were not
82
experimentally tested; although their computationally derived sequences showed some
similarities to orthogonal aaRSes previously derived from experimental studies by Schultz
et al.[10, 42]
Aminoacylation of tRNAs by aaRSes is a complex, multi-step process.[43] In general, aaRS
is bound by a specific amino acid substrate, which is subsequently activated through an
adenylation reaction with ATP. The activated amino acid is then transferred to the 3′ end
of an aaRS-bound tRNA by releasing the attached AMP molecule, consequently producing
a charged tRNA. A computational model depicting this entire process is very difficult to
produce. Like many other enzyme-substrate studies[37], we presume that the ability for an
amino acid to bind a particular aaRS is very important for establishing their enzyme-
substrate relationship. To achieve the goal of computationally designing orthogonal
aaRSes for unAAs, it might not be necessary to accurately estimate the absolute binding
energy and binding affinity of an aaRS/unAA pair; however, it is critical to identify
computational parameters that can group favorable and unfavorable aaRS-amino acid
complexes. Herein, we report our evaluation of several computational methods for scoring
binding energies of a number of aaRS-amino acids complexes. These benchmark
experiments were performed with complexes of orthogonal MjTyrRS or EcTyrRS mutants
bound to their experimentally verified unAA substrates, and compared to Tyr—the natural
substrate for wild-type MjTyrRS and EcTyrRS. We compared the results of several popular
computational methods, including AutoDock Vina[44], ROSETTA[45], and Molecular
Mechanics/Poisson–Boltzmann Surface Area (MM/PBSA)[46]. We performed MM/PBSA
83
binding energy scoring based on a 10-ns Molecular Dynamics (MD) simulation, and direct
MM-PBSA scoring based on single energy-minimized structures. These tested methods,
which required varying amounts of computational resources, yielded different capabilities
for grouping favorable and unfavorable aaRS-amino acid interactions. Moreover, we
analyzed the factors contributing to amino acid recognition. In particular, a polyspecific
EcTyrRS mutant was studied for its capacity to utilize several different unAAs as its
enzymatic substrates.[36, 47]
4.2 Methods
4.2.1 Preparation of aaRS-Amino Acid Complexes
We computationally studied seven aaRS-amino acid complexes. The following two X-ray
crystal structures were downloaded from Protein Data Bank (PDB): MjTyrRS-derived p-
acetylphenylalanyl-tRNA synthetase (MjAcFRS) bound with AcF (PDB 1ZH6)[48], and
MjTyrRS-derived 3-iodotyrosyl-tRNA synthetase (MjIoYRS) bound with 3-iodo-L-
tyrosine (IoY) (PDB 2ZP1)[49]. These two complexes were cleaned by removing water
molecules, co-crystallized ions, and non-amino acid ligands. Hydrogen atoms of the amino
acid ligands were added in VEGA ZZ.[50] To derive the complex structures of proteins
bound with Tyr, the side chains of the unAA ligands in the above two complexes were
manually edited in VEGA ZZ and combined with the corresponding protein coordinates
by matching the coordinates of the unchanged ligand atoms. This process generated two
additional aaRS-amino acid complexes: MjAcFRS bound with Tyr, and MjIoYRS bound
with Tyr. No X-ray crystal structure is available for the polyspecific synthetase (EcPolyRS)
84
derived from EcTyrRS. Based on the X-ray crystal structure of the wild-type EcTyrRS
(PDB 1X8X), we used SWISS-MODEL[52] to perform homologous modeling of the
EcPolyRS structure. The coordinates of Tyr in 1X8X were combined with the modeled
protein coordinates to derive an EcPolyRS-Tyr complex. We also manually edited Tyr in
VEGA ZZ to derive coordinates for the unAAs, p-iodophenylalanine (IoF) and AcF. They
were combined with the modeled EcPolyRS coordinates to derive two additional
complexes: EcPolyRS bound with IoY and EcPolyRS bound with AcF.
Figure 4.1. Chemical structures of natural and unnatural amino acids used in this
study (1: p-acetyl-L-phenylalanine, AcF; 2: 3-iodo-L-tyrosine, IoY; 3: p-iodo-L-
phenylalanine, IoF; and 4: L-tyrosine, Tyr).
Further relaxation of these complexes was achieved with GROMACS-4.6.5.[53, 54] The
force field for proteins was set to AMBER99SB.[55] ACPYPE[56] was used to treat ligands
based on Generalized Amber Force Field (GAFF).[57, 58] The complexes were immersed in
a dodecahedron box of SPC/E water molecules. The water box was extended 1 nm from
solute atoms in all directions. Counter ions, such as Na+ and Cl–, were added to neutralize
the systems. Particle mesh Ewald (PME) was used to treat the long-range electrostatic
85
interactions in molecular mechanics (MM) energy minimization. The systems were
minimized by using the steepest descent algorithm. The minimization was stopped either
at 50,000 steps or until the maximum force was smaller than 10.0 kJ/mol.
4.2.2 Binding Energy Scoring with Autodock Vina and ROSETTA
The energy score function embedded in Autodock Vina 1.1.2[44] was used to assess the
binding free energies of all complexes. The pdbqt files of proteins and ligands were
prepared in AutoDockTools[59] from the above-mentioned complexes. Polar hydrogens
were added and the binding free energies were calculated using the embedded “score only”
option in Autodock Vina. Coordinates of proteins and ligands were separated in PyMol.
We followed a previously reported procedure to score aaRS-amino acid complexes using
ROSETTA 3.5[45]. The interface energy term was used in this study to evaluate the binding.
86
Pro
tein
s M
jAcF
RS
M
jIoY
RS
E
cPoly
RS
Am
ino
aci
ds
AcF
T
yr
IoY
T
yr
IoF
A
cF
Tyr
Ener
gy
Sco
res
Auto
Dock
Vin
a
(kca
l/m
ol)
− 7
.09
[1.1
5]a
− 6
.15
− 6
.85
[1.0
5]a
− 6
.54
− 5
.95
[1.0
5]a
− 6
.56
[1.1
5]a
− 5
.68
RO
SE
TT
A (
RE
U)
− 1
0.7
2
[1.2
0]a
− 8
.94
− 1
6.1
2
[1.4
4]a
− 1
1.2
1
− 1
3.2
2
[1.1
9]a
− 1
3.2
7
[1.2
0]a
− 1
1.0
8
Tab
le 4
.1. E
stim
ate
d b
ind
ing
fre
e en
ergie
s u
sin
g A
uto
Dock
Ven
a a
nd
RO
SE
TT
A f
or
the
seven
tes
ted
aaR
S-a
min
o
aci
d c
om
ple
xes
. a
Rati
os
of
the
esti
ma
ted
bin
din
g f
ree
ener
gie
s fo
r th
e in
dic
ate
d a
aR
S-u
nA
A co
mp
lexes
to t
he
bin
din
g f
ree
ener
gie
s of
the
corr
esp
on
din
g a
aR
S-T
yr
com
ple
xes
.
87
Figure 4.2. The RMSD values in the MD trajectories of the seven studied aaRS-
amino acid complexes.
88
4.2.3 Molecular Dynamics Simulations
Molecular Dynamics (MD) simulations were performed in Gromacs-4.6.5.[53, 54] The
solvated and MM-energy-minimized ligand-protein complexes were heated to 300 K
during a 100 ps constant volume simulation with 2 fs time step. The pressure was then
equilibrated to 1 atm during a 100 ps isothermal-isobaric NPT simulation with 2 fs time
step. All heavy atoms were position-restrained with a force constant of 1000
kJ•mol−1•nm−2. Simulations were performed for 10 ns with a time step of 2 fs. The
temperature and pressure were maintained at 300 K and 1 atm using the V-rescale
temperature and Parrinello-Rahman pressure coupling method, respectively. The time
constants for the temperature and pressure coupling were set at 0.1 ps and 2 ps, respectively.
Short-range, non-bonded interactions were computed for the atom pairs within the 9 Ao
cutoff. Long-range electrostatic interactions were calculated using a PME summation
method with fourth-order cubic interpolation and 1.6 Ao grid spacing. All bonds were
constrained using the parallel LINCS method. Xmgrace was used to plot the data and
graphs generated from Gromacs.
4.2.4 MM/PBSA Binding Energy Calculation
We used g_mmpbsa to estimate MM/PBSA binding energies.[60] The average binding
energy was calculated from 100 snapshots extracted every 50 ps from the MD trajectories
between 5 and 10 ns. The non-polar solvation energy was calculated based on the SASA
model. The vacuum and solvent dielectric constants were set at 1 and 80, respectively. The
solute dielectric constant was set at 2. The entropy term was not included in our binding
89
energy calculation. A bootstrap analysis was performed to obtain standard errors. To
calculate the binding energy based on single snapshots, we followed all of the
abovementioned procedure, except that MM-energy-minimized aaRS-unAA complexes
were directly utilized for energy calculations without any MD treatment.
4.3 Results and Discussion
4.3.1 Selection and Preparation of aaRS-Amino Acid Complexes
A very large number of orthogonal aaRSes have been derived from MjTyrRS and
EcTyrRS[8], which are currently widely utilized for the genetic encoding of unAAs in
bacterial and eukaryotic cells, respectively. We examined available co-crystal structures of
MjTyrRS mutants with unAAs and decided to use two complexes in our study: MjAcFRS
bound with AcF, and MjIoYRS bound with IoY. AcF has a side-chain carbonyl group for
H-bond formation with the corresponding aaRS, whereas the IoY side chain can interact
with the aaRS through both H-bonding and non-H-bonding van der Waals interactions
(Figure 4.1). For our study, we also selected an EcTyrRS-derived polyspecific synthetase,
EcPolyRS, which was originally engineered for the genetic encoding of IoF. In addition to
IoF, we later found that EcPolyRS was also capable of using several other unAAs,
including AcF, as its substrate.[36, 47] Because no X-ray crystal structure of EcPolyRS is
available, we used the wild-type EcTyrRS 3D-structure as the template for homologous
modeling of EcPolyRS. The manually edited coordinates of unAAs, IoF and AcF (Figure
4.1), were next combined with the modeled protein structure. The side chain of IoF is
expected to interact with EcPolyRS mainly through non-H-bonding van der Waals
90
interactions, while the carbonyl group of AcF would act as an excellent H-bond donor. We
also modeled Tyr into these aforementioned aaRS structures in order to computationally
evaluate and compare the binding energies of these aaRSes to Tyr. All three selected
aaRSes have an excellent capacity for discriminating unAAs from Tyr, as shown from
previous protein expression experiments by the lack of Tyr usage as an enzymatic substrate
at physiological concentrations.[11, 42, 49] Because our ultimate goal is to computationally
design orthogonal aaRSes for unAAs, and currently, it is difficult to model water molecules
at the protein/ligand interfaces to effectively mediate interactions, we removed water
molecules from these complex structures. All abovementioned amino acid-aaRS
complexes were subjected to relaxation through a standard MM energy minimization
process.
4.3.2 Binding Energy Scoring with AutoDock Vina and ROSETTA
In order to achieve the computational design of orthogonal aaRSes for unAAs, it is crucial
to predict their interaction modes and ultimately differentiate the interactions of aaRSes
between different amino acid ligands. In this present study, we do not evaluate strategies
for protein randomization and binding pose searching. Instead, we mainly focus on
approaches to evaluate binding affinities of aaRSes and amino acids. Energy scoring
functions implemented in docking programs are usually designed to minimize computing
costs, and thus, they can be utilized to evaluate large numbers of protein–ligand
complexes.[44, 61] We first utilized AutoDock Vina, a popular molecular docking suite that
includes an Amber-force-field-based scoring function, to evaluate the interactions of our
91
selected aaRSes with corresponding unAAs and Tyr. The estimated free energies of
binding were all within the range of − 5.68 to − 7.09 kJ/mol (Table 4.1). Although the
estimated binding free energies between aaRSes and unAAs were typically lower than that
between aaRSes and Tyr, the differences were minimal. The binding free energies scored
with AutoDock Vina, for both MjIoYRS-IoY and EcPolyRS-IoF, were only different from
their corresponding aaRS-tyrosine complexes by 5%. Larger differences (~15%) were
observed for MjAcFRS-AcY and EcPolyRS-AcF. The binding free energies for the
examined four favorable aaRS-unAA complexes were − 6.61 ± 0.49 kJ/mol, whereas the
binding energies for the three unfavorable aaRS-Tyr complexes were − 6.09 ± 0.39 kJ/mol.
This method failed to confidentially distinguish favorable interaction from unfavorable
interactions. It is worthwhile to note that the numbers in Table 4.1 were derived by scoring
single poses from X-ray crystal structures or homologous models, and the gaps were not
improved by performing protein-ligand docking with flexible aaRS side chains. We next
turned to ROSETTA, another popular suite of programs widely used for protein structure
prediction, protein design, and protein-protein and protein-ligand docking.[62] We scored
the interface energies of various aaRSes-amino acid complexes. The estimated interface
energies in ROSETTA energy units (REU) are shown in Table 4.1. This method was
generally capable of identifying the binding energy differences of aaRSes to their real
unAA substrates and Tyr; and these differences were within the range of 19% to 44%.
However, when all estimated interface energies were analyzed together, there was no
obvious threshold to differentiate between favorable and unfavorable interactions, as
defined by wet lab results. For example, the interface energy score for the unfavorable
92
MjIoYRS-Tyr complex (− 11.21 REU) was even lower than that for the favorable
MjAcFRS-AcF complex (− 10.72 REU). Our data indicates that ROSETTA might not be
very reliable to predict whether an amino acid is a true substrate of a particular aaRS.
4.3.3 Binding Energy Estimation by MD-MM/PBSA or Direct
MM/PBSA
Compared to energy scoring functions implemented in docking programs, free-energy
simulation techniques, such as MD-MM/PBSA, are known to have better accuracy for
binding energy ranking. However, this gain is accompanied by a much higher
computational cost. We performed 10-ns MD simulations for each protein-ligand complex.
We monitored the root-mean-square deviation (RMSD) values of the whole complexes and
observed that they typically reached a plateau after the first 3–4 ns (Figure 4.2). We next
selected 100 equal-interval snapshots between 5 ns and 10 ns of each simulation to estimate
binding free energies for each aaRS-amino acid complex. Considering that the aaRS-amino
acid interfaces are moderately charged, we used a dielectric constant of 2 to estimate the
energy values.[63] Previous studies also showed that the conformational entropy was only
important for predicting absolute binding free energies but not important for ranking the
binding affinities of similar substrates.[64] Hence, in order to minimize computational costs,
we did not include the entropy term in our calculations. The estimated binding energies for
the aaRSes and their favorable unAA substrates were within the range of − 15.35 to − 19.16
kcal/mol, whereas the estimated binding energies for the aaRSes and their unfavorable Tyr
substrate were within the range of − 9.82 to − 10.56 kcal/mol (Table 4.2), illustrating a
93
distinct gap between these two groups of values. The average binding energy for the former
group was − 16.61 ± 1.76 kcal/mol, whereas the latter group was − 10.19 ± 0.37 kcal/mol.
Subjecting these two groups to a two-tailed test yields a p-value of 0.004, indicating a
significant difference. It is well accepted that MD simulation improves energy calculations
by using conformational sampling, but comes at the cost of significant computational
resources, thereby making MD-MM/PBSA evaluations of a large number of aaRS-amino
acid complexes infeasible.[61] We next utilized MM/PBSA to directly score single energy-
minimized structures of the seven aaRS-amino acid complexes.[65] The results (Table 4.2),
derived from a much-reduced computing cost, were slightly different from energy values
from sampling MD trajectories, but still useful in grouping favorable aaRS-aaRS
complexes from unfavorable ones. The estimated binding energies for these favorable
complexes were within the range of − 16.48 to − 21.87 kcal/mol, whereas the numbers for
these unfavorable ones were within the range of − 9.33 to − 11.13 kcal/mol. The average
value for the former group was − 18.83 ± 2.36 kcal/mol, whereas the latter group was −
10.14 ± 0.91 kcal/mol. A two-tailed test still showed a significant difference (p = 0.002)
between the two groups. Scoring functions of docking softwares use various
approximations to increase computational efficiency.[61] These methods are designed for
screening a large number of mutants with reasonable speeds, but at the cost of accuracy.
MM/PBSA uses a more-rigorous scoring function, generally leading to better prediction
accuracy.[65] Considering this and based on our results, we suggest using direct MM/PBSA
scoring to re-evaluate top hits of orthogonal aaRS designs from docking programs, such as
AutoDock Vina and ROSETTA. Moreover, for the few top-ranked candidates in single-
94
structure MM/PBSA scoring experiments, it may be desirable to perform MD and
MM/PBSA rescoring based on snapshots of MD trajectories to increase the accuracy. This
combinatorial approach, which balances computational costs and prediction accuracy, has
the potential to accelerate the engineering of orthogonal aaRS for the genetic encoding of
unAAs.
95
∆Evdw a ∆Eele a ∆Gps a ∆GSASAa ∆Gtotal
a,b
MjAcFRS +
AcF
MD-
MM/PBSA −30.56 ± 0.27
−26.12 ±
0.30 44.45 ± 0.22 −3.11 ± 0.01 −15.35 ± 0.22
direct
MM/PBSA −30.08 −27.73 43.49 −3.24 −17.56
MjAcFRS +
Tyr
MD-
MM/PBSA −25.61 ± 0.25
−23.46 ±
0.30 41.68 ± 0.21 −2.79 ± 0.01 −10.17 ± 0.23
direct
MM/PBSA −23.72 −25.99 43.31 −2.92 −9.33
MjIoYRS +
IoY
MD-
MM/PBSA −30.08 ± 0.29
−26.49 ±
0.33 43.09 ± 0.31 −2.97 ± 0.01 −16.45 ± 0.26
direct
MM/PBSA −30.15 −31.61 42.71 −2.81 −21.87
MjIoYRS +
Tyr
MD-
MM/PBSA −24.77 ± 0.25
−25.48 ±
0.31 43.18 ± 0.24 −2.73 ± 0.01 −9.82 ± 0.25
direct
MM/PBSA −23.54 −31.54 46.61 −2.66 −11.13
EcPolyRS +
IoF
MD-
MM/PBSA −26.98 ± 0.28
−30.01 ±
0.31 40.76 ± 0.38 −2.93 ± 0.01 −19.16 ± 0.32
direct
MM/PBSA −30.19 −43.17 56.82 −2.88 −19.41
EcPolyRS +
AcF
MD-
MM/PBSA −30.07 ± 0.24
−31.87 ±
0.28 49.55 ± 0.32 −3.10 ± 0.01 −15.49 ± 0.25
direct
MM/PBSA −29.76 −47.76 64.09 −3.05 −16.48
EcPolyRS +
Tyr
MD-
MM/PBSA −23.64 ± 0.32
−31.41 ±
0.35 47.17 ± 0.28 −2.68 ± 0.01 −10.56 ± 0.24
direct
MM/PBSA −25.99 −41.69 60.54 −2.61 −9.75
Table 4.2. Calculated binding energies using MD-MM/PBSA or direct MM/PBSA
for the seven aaRS-amino acid complexes. a All values are given in kcal/mol, and
MD-MM/PBSA values are given as average ± S.D. b The total of van der Waals
interaction energy ( ∆E vdw ), electron static energy (∆E ele ), and polar (∆G ps )
and nonpolar (∆G SASA ) solvation energy.
96
Figure 4.3. The contributions of individual amino acid residues of aaRSes to the
total binding energies, shown as the energy contribution differences between the
indicated aaRS-unAA complexes and aaRS-Tyr complexes. Negative values indicate
a stabilization effect for aaRS-unAA interactions or a destabilization effect for
aaRS-Tyr interactions, whereas positive values indicate a destabilization effect for
aaRS-unAA interactions or a stabilization effect for aaRS-Tyr interactions.
4.3.4 Binding Modes of aaRS-unAA Complexes
The first step of tRNA aminoacylation involves the interaction of an amino acid substrate
to the aaRS, which is often the initial focus of engineering orthogonal aaRSes because its
potential interaction with the natural Tyr substrate has to be minimized. Compared to co-
crystal structures or structures derived from molecular modeling, MD-MM/PBSA studies
can provide information on the dynamics and energy contributions for aaRS-amino acid
recognition. We analyzed the contributions of individual amino acid residues of aaRSes to
the total binding energies of all studied aaRS-amino acid complexes (Figure 4.3). We also
97
averaged MD structures from the MD trajectories to derive aaRS-amino acid complex
structures (Figure 4.4). We identified His70, Gln109, Gln155, Gly158, and Cys159 to be
important for maintaining the interaction of MjAcFRS to AcF versus Tyr (Figures 4.3A
and 4.4A). Gln109 forms a H-bond to the carbonyl group of AcF, but not to Tyr. Gly158
and Cys159 form non-H-bond van der Waals packing interactions with the methyl group
of AcF. His70 and Gln155 interact with residues 109, 158, and 159 to further stabilize the
MjAcFRS-AcF complex. Similarly, we found that Met154, Gln155, and Thr158 are critical
for establishing packing of MjIoYRS to IoY versus Tyr (Figures 4.3B and 4.4B). Three
residues in the amino acid-binding pocket of EcPolyRS are different from the
corresponding residues of wild-type EcTyrRS: Ile37, Ser182, and Met183. Surprisingly,
no strong interaction is conferred by these mutations to differentiate AcF and IoF from Tyr.
Our MD study suggests that the relative interacting position of Tyr in EcPolyRS is slightly
different from that of AcF or IoF. The hydroxyl group of Tyr is located toward Gln195 to
form a H-bond, which contributes to the stabilization of the EcPolyRS-Tyr complex
(Figures 4.3C,D and 4.4C,D). However, this twist significantly destabilizes the interaction
with Asp81 and Asp41, consequently yielding an energy-disfavored complex as a whole.
In comparison, AcF and IoF occupy the binding pocket in a way similar to Tyr in wild-
type EcTyrRS. They have more favorable interactions to Asp81 and Asp41. This
phenomenon might explain the polyspecificity of EcPolyRS, which has been shown to use
at least 14 different unAAs as its enzymatic substrate. These unAAs likely interact with
EcPolyRS in a direction similar to that of AcF and IoF, but not to Tyr; whereas no strong
side-chain recognition is required to stabilize these EcPolyRS-unAA complexes. We also
98
observed a H-bond between Asn126 and the carbonyl group of AcF, but such stabilization
does not exist in the EcPolyRS-IoF complex, and likely, it does not exist in many other
EcPolyRS-unAA complexes considering the structural diversity of these 14 different
unAAs.[36, 47]
99
Figure 4.4. MD-averaged structures showing the active sites of the studied aaRSes
and unAA complexes. (A): MjAcFRS bound with AcF; (B): MjIoYRS bound with
IoY; (C): EcPolyRS bound with AcF; and (D): EcPolyRS bound with IoF). Ligands
are shown as cyan sticks. Residues important for substrate specificity are shown as
green sticks. In panels C and D, Tyr ligands are shown as magenta sticks for
comparison. Ile37, Ser182, and Met183 of EcPolyRS are shown as gray balls.
4.4 Conclusions
We performed computational studies to evaluate the binding energies of several aaRS-
amino acid complexes. Using orthogonal aaRS-unAA pairs whose strong interactions have
100
been previously reportedin experimental studies, we compared the accuracy of AutoDock
Vina, ROSETTA, MM/PBSA, and MD-MM/PBSA in terms of grouping favorable and
unfavorable interactions based on estimated binding free energies. We found that the most
accurate grouping was derived from MM/PBSA based on either 10-ns MD trajectories or
single energy-minimized structures. As such, we suggest using MM/PBSA to re-score top-
hit poses produced by other faster, but less-accurate programs, in future aaRS-designing
experiments. We also compared the binding models of the studied aaRSes to unnatural and
natural amino acids. In general, the aaRSes established new H-bonds, or non-H-bond van
der Waals interactions, to stabilize their unAA substrates. Moreover, they may adopt
conformations to largely destabilize their interactions to the native Tyr substrate, as shown
in the twisted interactions between EcPolyRS and Tyr. We hope to use these results to
guide future designing and development of new aaRSes, and to extend the capability of the
genetic code expansion technology to many new unAAs.
101
References:
[1] Murgola, E. J. tRNA, suppression, and the code. Annu. Rev. Genet. 19, 57–80
(1985).
[2] Jukes, T. H. & Osawa, S. Evolutionary changes in the genetic code. Comp.
Biochem. Physiol. B 106, 489–94 (1993).
[3] Walsh, C. T., Garneau-Tsodikova, S. & Gatto, G. J., Jr. Protein posttranslational
modifications: the chemistry of proteome diversifications. Angew. Chem. Int. Ed. 44,
7342–72 (2005).
[4] Goodman, H. M., Abelson, J., Landy, A., Brenner, S. & Smith, J. D. Amber
suppression: a nucleotide change in the anticodon of a tyrosine transfer RNA. Nature 217,
1019–24 (1968).
[5] Brown, C. M., Dalphin, M. E., Stockwell, P. A. & Tate, W. P. The translational
termination signal database. Nucleic Acids Res. 21, 3119–23 (1993).
[6] Bock, A. et al. Selenocysteine: the 21st amino acid. Mol. Microbiol. 5, 515–20
(1991).
[7] Krzycki, J. A. The direct genetic encoding of pyrrolysine. Curr. Opin. Microbiol.
8, 706–12 (2005).
[8] Liu, C. C. & Schultz, P. G. Adding new chemistries to the genetic code. Annu. Rev.
Biochem. 79, 413–44 (2010).
[9] Ai, H. W. Biochemical analysis with the expanded genetic lexicon. Anal. Bioanal.
Chem. 403, 2089–2102 (2012).
102
[10] Wang, L., Brock, A., Herberich, B. & Schultz, P. G. Expanding the genetic code of
Escherichia coli. Science 292, 498–500 (2001).
[11] Chin, J. W. et al. An Expanded Eukaryotic Genetic Code. Science 301, 964–967
(2003).
[12] Summerer, D. et al. A genetically encoded fluorescent amino acid. Proc. Natl.
Acad. Sci. USA. 103, 9785–9789 (2006).
[13] Wang, J., Xie, J. & Schultz, P. G. A genetically encoded fluorescent amino acid. J.
Am. Chem. Soc. 128, 8738–9 (2006).
[14] Lee, H. S., Spraggon, G., Schultz, P. G. & Wang, F. Genetic incorporation of a
metal-ion chelating amino acid into proteins as a biophysical probe. J. Am. Chem.
Soc .131, 2481–3 (2009).
[15] Smith, E. E., Linderman, B. Y., Luskin, A. C. & Brewer, S. H. Probing Local
Environments with the Infrared Probe: l-4- Nitrophenylalanine. J. Phys. Chem. B 115,
2380–2385 (2011).
[16] Cellitti, S. E. et al. In vivo incorporation of unnatural amino acids to probe structure,
dynamics, and ligand binding in a large protein by nuclear magnetic resonance
spectroscopy. J. Am. Chem. Soc. 130, 9268–81 (2008).
[17] Chin, J. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. Addition of a
photocrosslinking amino acid to the genetic code of Escherichiacoli. Proc. Natl. Acad. Sci.
USA .99, 11020–4 (2002).
103
[18] Ai, H. W., Shen, W., Sagi, A., Chen, P. R. & Schultz, P. G. Probing protein-protein
interactions with a genetically encoded photo- crosslinking amino acid. Chembiochem 12,
1854–1857 (2011).
[19] Zhang, M. et al. A genetically incorporated crosslinker reveals chaperone
cooperation in acid resistance. Nat. Chem. Biol. 7, 671–7 (2011).
[20] Chin, J. W. et al. Addition of p-azido-L-phenylalanine to the genetic code of
Escherichia coli. J. Am. Chem. Soc. 124, 9026–7 (2002).
[21] Lang, K. et al. Genetically encoded norbornene directs site-specific cellular protein
labelling via a rapid bioorthogonal reaction. Nat. Chem. 4, 298–304 (2012).
[22] Lang, K. et al. Genetic Encoding of bicyclononynes and trans-cyclooctenes for site-
specific protein labeling in vitro and in live mammalian cells via rapid fluorogenic Diels-
Alder reactions. J. Am. Chem. Soc. 134, 10317–20 (2012).
[23] Deiters, A., Groff, D., Ryu, Y., Xie, J. & Schultz, P. G. A genetically encoded
photocaged tyrosine. Angew. Chem. Int. Ed. 45, 2728–31 (2006).
[24] Chen, P. R. et al. A facile system for encoding unnatural amino acids in mammalian
cells. Angew. Chem. Int. Ed. 48, 4052–5 (2009).
[25] Baker, A. S. & Deiters, A. Optical Control of Protein Function through Unnatural
Amino Acid Mutagenesis and Other Optogenetic Approaches. ACS Chem. Biol. 9, 1398–
407 (2014).
[26] Wu, N., Deiters, A., Cropp, T. A., King, D. & Schultz, P. G. A genetically encoded
photocaged amino acid. J. Am. Chem. Soc .126, 14306–7 (2004).
104
[27] Arbely, E., Torres-Kolbus, J., Deiters, A. & Chin, J. W. Photocontrol of tyrosine
phosphorylation in mammalian cells via genetic encoding of photocaged tyrosine. J. Am.
Chem. Soc. 134, 11912–5 (2012).
[28] Ren, W., Ji, A. & Ai, H. W. Light activation of protein splicing with a photocaged
fast intein. J. Am. Chem. Soc. 137, 2155–8 (2015).
[29] Wang, Y. S. et al. A genetically encoded photocaged N-methyl-L-lysine. Molecular
BioSystems 6, 1557–1560 (2010).
[30] Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Genetically encoding N-epsilon-
acetyllysine in recombinant proteins. Nat. Chem. Biol. 4, 232–234 (2008).
[31] Park, H. S. et al. Expanding the genetic code of Escherichia coli with
phosphoserine. Science 333, 1151–4 (2011).
[32] Ai, H. W., Lee, J. W. & Schultz, P. G. A method to site-specifically introduce
methyllysine into proteins in E. coli. Chem. Commun. 46, 5506–8 (2010).
[33] Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & Chin, J. W. Encoding
multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature
464, 441–4 (2010).
[34] Wang, Y. S., Fang, X., Wallace, A. L., Wu, B. & Liu, W. R. A rationally designed
pyrrolysyl-tRNA synthetase mutant with a broad substrate spectrum. J. Am. Chem. Soc.
134, 2950–3 (2012).
[35] Young, D. D. et al. An evolved aminoacyl-tRNA synthetase with atypical
polysubstrate specificity. Biochemistry 50, 1894–900 (2011).
105
[36] Chatterjee, A., Xiao, H., Bollong, M., Ai, H. W. & Schultz, P. G. Efficient viral
delivery system for unnatural amino acid mutagenesis in mammalian cells. Proc. Natl.
Acad. Sci. USA. 110, 11803–8 (2013).
[37] Linder, M. Computational Enzyme Design: Advances, hurdles and possible ways
forward. Comput. Struct. Biotechnol. J. 2, e201209009 (2012).
[38] Wang, P., Vaidehi, N., Tirrell, D. A. & Goddard, W. A., 3rd. Virtual screening for
binding of phenylalanine analogues to phenylalanyl-tRNA synthetase. J. Am. Chem. Soc.
124, 14442–9 (2002).
[39] Datta, D., Vaidehi, N., Zhang, D. & Goddard, W. A., 3rd. Selectivity and specificity
of substrate binding in methionyl-tRNA synthetase. Protein Sci. 13, 2693–705 (2004).
[40] Zhang, D., Vaidehi, N., Goddard, W. A., 3rd, Danzer, J. F. & Debe, D. Structure-
based design of mutant Methanococcus jannaschii tyrosyl-tRNA synthetase for
incorporation of O-methyl-L-tyrosine. Proc. Natl. Acad. Sci. USA. 99, 6579–84 (2002).
[41] Sun, R., Zheng, H., Fang, Z. & Yao, W. Rational design of aminoacyl-tRNA
synthetase specific for p-acetyl-L-phenylalanine. Biochem. Biophys. Res. Commun. 391,
709–15 (2010).
[42] Wang, L., Zhang, Z., Brock, A. & Schultz, P. G. Addition of the keto functional
group to the genetic code of Escherichia coli. Proc. Natl. Acad. Sci. USA. 100, 56–61
(2003).
[43] Ibba, M. & Soll, D. Aminoacyl-tRNAs: setting the limits of the genetic code. Genes
Dev. 18, 731–8 (2004).
106
[44] Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of
docking with a new scoring function, efficient optimization, and multithreading. J.
Comput. Chem. 31, 455–61 (2010).
[45] Meiler, J. & Baker, D. ROSETTALIGAND: protein-small molecule docking with
full side-chain flexibility. Proteins 65, 538–48 (2006).
[46] Kollman, P. A. et al. Calculating structures and free energies of complex molecules:
combining molecular mechanics and continuum models. Acc. Chem. Res. 33, 889–97
(2000).
[47] Chen, Z. J., Ren, W., Wright, Q. E. & Ai, H. W. Genetically encoded fluorescent
probe for the selective detection of peroxynitrite. J. Am. Chem. Soc. 135, 14940–3 (2013).
[48] Turner, J. M., Graziano, J., Spraggon, G. & Schultz, P. G. Structural
characterization of a p-acetylphenylalanyl aminoacyl-tRNA synthetase. J. Am. Chem. So.c
127, 14976–7 (2005).
[49] Sakamoto, K. et al. Genetic encoding of 3-iodo-L-tyrosine in Escherichia coli for
single-wavelength anomalous dispersion phasing in protein crystallography. Structure 17,
335–44 (2009).
[50] Pedretti, A., Villa, L. & Vistoli, G. VEGA-an open platform to develop chemo-bio-
informatics applications, using plug-in architecture and script programming. J. Comput.
Aided Mol. Des. 18, 167–73 (2004).
[51] Kobayashi, T. et al. Structural snapshots of the KMSKS loop rearrangement for
amino acid activation by bacterial tyrosyl-tRNA synthetase. J. Mol. Biol. 346, 105–17
(2005).
107
[52] Biasini, M. et al. SWISS-MODEL: modelling protein tertiary and quaternary
structure using evolutionary information. Nucleic Acids Res. 42, W252–8 (2014).
[53] Hess, B., Kutzner, C., van der Spoel, D. & Lindahl, E. GROMACS 4: Algorithms
for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory
Comp. 4, 435–447 (2008).
[54] Pronk, S. et al. GROMACS 4.5: a high-throughput and highly parallel open source
molecular simulation toolkit. Bioinformatics 29, 845–854 (2013).
[55] Hornak, V. et al. Comparison of multiple Amber force fields and development of
improved protein backbone parameters. Proteins 65, 712–25 (2006).
[56] Sousa da Silva, A. & Vranken, W. ACPYPE - AnteChamber PYthon Parser
interfacE. BMC Research Notes 5, 367 (2012).
[57] Wang, J., Wang, W., Kollman, P. A. & Case, D. A. Automatic atom type and bond
type perception in molecular mechanical calculations. J. Mol. Graph Model. 25, 247–60
(2006).
[58] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development
and testing of a general amber force field. J. Comput. Chem. 25, 1157–74 (2004).
[59] Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with
selective receptor flexibility. J. Comput. Chem. 30, 2785–91 (2009).
[60] Kumari, R., Kumar, R. & Lynn, A. g_mmpbsa—A GROMACS Tool for High-
Throughput MM-PBSA Calculations. J. Chem. Inf. Model. 54, 1951–1962 (2014).
108
[61] Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in
virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3,
935–49 (2004).
[62] Liu, Y. & Kuhlman, B. RosettaDesign server for protein design. Nucleic Acids Res.
34, W235–8 (2006).
[63] Sun, H. et al. Assessing the performance of MM/PBSA and MM/GBSA methods.
5. Improved docking performance using high solute dielectric constant MM/GBSA and
MM/PBSA rescoring. Phys. Chem. Chem. Phys. 16, 22035–22045 (2014).
[64] Hou, T., Wang, J., Li, Y. & Wang, W. Assessing the performance of the MM/PBSA
and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on
molecular dynamics simulations. J. Chem. Inf. Model 51, 69–82 (2011).
[65] Rastelli, G., Del Rio, A., Degliesposti, G. & Sgobba, M. Fast and accurate
predictions of binding free energies using MM-PBSA and MM-GBSA. J. Comput. Chem.
31, 797–810 (2010).
109
Chapter 5: Summary
In the thesis, a protein splicing method by implementing a photocaged fast intein has been
developed. A photocaged cysteine was genetically introduced into a highly efficient Nostoc
punctiforme (Npu) DnaE intein. The resulting photocaged intein was inserted into a red
fluorescent protein (RFP) mCherry and a human Src tyrosine kinase to create inactive
chimeric proteins. A light-induced photochemical reaction was able to reactivate the intein
and trigger protein splicing. Active mCherry and Src were formed as observed by direct
fluorescence imaging or imaging of a Src kinase sensor in mammalian cells. The
genetically encoded photocaged intein is a general optogenetic tool, allowing effective
photocontrol of primary structures and functions of proteins. In future, this method could
be applied into various disease-related cysteine proteases (e.g. cathepsin) for medical
systems biology study. Ideally, scientists would be able to uncover the dynamics of the cell
system, which was induced by functionalities of those enzymes.
In the third chapter, a dinitrophenyl hapten unnatural amino acid was genetically
incorporated in the cell system. On the one hand, it introduced the drug industry an
alternative way to label protein drugs; on the other hand, it provided us an approach to
study the immunostimulatory activity of various haptenated proteins.
Additionally, we evaluated the binding affinities between unnatural amino acids and
engineered orthogonal tyrosyl-tRNA synthetases through computational approaches. In the
110
study, we did observe a higher binding affinity between unUAAs and their canonical
engineered tyrosyl-tRNA synthetases, comparing with tyrosine and those synthetases.
More importantly, the evaluation method could be implemented into our future
computational synthetase design workflow. In the coming stage, I am very interested in
those unnatural amino acids, which could mimic the functionality of post-translational
modification residues. Ideally, we hope that orthogonal synthetases for unnatural amino
acids could be designed by computer, instead of tedious experimental screening process.
Also, those unnatural amino acids could serve as useful tools in studying PTM related
systems biology.