wei ren dissertation

UNIVERSITY OF CALIFORNIA

RIVERSIDE

Rewiring Translation for Photocontrol and Haptens, and Computational Analysis

A Dissertation submitted in partial satisfaction

of the requirements for the degree of

Doctor of Philosophy

in

Chemistry

by

Wei Ren

June 2016

Dissertation Committee:

Dr. Huiwang Ai, Chairperson

Dr. Ashok Mulchandani

Dr. Wenwan Zhong

Copyright by

Wei Ren

2016

The Dissertation of Wei Ren is approved:

Committee Chairperson

University of California, Riverside

iv

ACKNOWLEDGEMENT

This dissertation has used paragraphs, sentences, figures and tables from four published

articles by Wei Ren and Dr. Huiwang Ai. Published articles are listed below:

1. Wei Ren, Huiwang Ai. 2012. Ribosomal incorporation of unnatural amino acids:

learning from mother nature. Nova Publishers.

2. Wei Ren, Ao Ji, Huiwang Ai. 2015. Light activation of protein splicing with a

photocaged fast intein. Journal of American Chemical Society. 137(6), 2155-2158.

3. Wei Ren, Ao Ji, Michael X. Wang, Huiwang Ai. 2015. Expanding the genetic code

for a dinitrophenyl hapten. Chembiochem. 16(14), 2007-2010.

4. Wei Ren, Tan Truong, Huiwang Ai. 2015. Study of the binding energies between

unnatural amino acids and engineered orthogonal tyrosyl-tRNA synthetases.

Science Reports. 5, 12632.

v

To my father, Mr. Qilin Ren;

To my advisor, Dr. Huiwang Ai;

To scientists influenced me (Dr. Alan Turing and Dr. Nicholas Metropolis).

vi

ABSTRACT OF THE DISSERTATION

Rewiring Translation for Photocontrol and Haptens, and Computational Analysis

by

Wei Ren

Doctor of Philosophy, Chemistry

University of California, Riverside, June 2016

Dr. Huiwang Ai, Chairperson

The objective of my Ph.D. study is to expand the unnatural amino acid (unAA) toolbox to

genetically encode additional photocaging functional groups to achieve a precise control

of proteins with light, to site-specifically label proteins with hapten moieties, and to further

explore computational methods with an ultimate goal of using computers to design specific

orthogonal aminoacyl-tRNA synthetases (aaRSes) for given unAAs.

In this thesis, we show that cellular biochemical processes can be spatiotemporally

manipulated by light-activatable protein-splicing inteins. We genetically encoded a

photocaged cysteine and introduced the photocaged cysteine into a highly efficient Nostoc

punctiforme (Npu) DnaE intein, which is capable of excising itself and subsequently

splicing adjacent N- and C-terminal extein flanks to form a new truncated peptide. The

vii

resulting photocaged intein was inserted into a red fluorescent protein (RFP) mCherry and

a human Src tyrosine kinase, and a light-induced photochemical reaction was able to

reactivate the intein and trigger protein splicing. The genetically encoded photocaged intein

is a general optogenetic tool, allowing effective photocontrol of primary structures and

functions of proteins.

Haptens, such as dinitrophenyl (DNP), are small molecules that induce strong immune

responses when attached to proteins or peptides and, as such, have been exploited for

diverse applications. In this thesis, we engineered a Methanosarcina barkeri pyrrolysyl-

tRNA synthetase (mbPylRS) to genetically encode a DNP-containing unAA, N6-(2-(2,4-

dinitrophenyl)acetyl)lysine (DnpK). This technique is a promising strategy for biological

preparation of proteins containing site-specific DNP. This new capability is expected to

find broad applications in biosensing, immunology, and therapeutics.

The experimental procedure to derive orthogonal aaRSes/aminoacyl tRNAs, which

typically involves several rounds of positive and negative selection, is laborious and time-

consuming, and requires considerable expertise. It is often not trivial to derive orthogonal

aaRSes for unAA substrates that are very different from the enzymes’ native substrates. In

this thesis, we compared several computational algorithms to evaluate the binding energies

of unAA and previously developed orthogonal aaRSes. We hope to use these results to

guide future designing and development of new aaRSes, and to extend the capability of the

genetic code expansion technology to many new unAAs.

viii

TABLE OF CONTENTS

SIGNATURE PAGE ......................................................................................................... iii

ACKNOWLEDGEMENT ................................................................................................. iv

DEDICATIONS ...................................................................................................................v

ABSTRACT ....................................................................................................................... vi

TABLE OF CONTENTS ................................................................................................. viii

LIST OF FIGURES .............................................................................................................x

LIST OF SCHEME ........................................................................................................... xii

LIST OF TABLES ........................................................................................................... xiii

Chapter 1: Introduction ........................................................................................................1

1.1 Genetic Encoding Unnatural Amino Acids ................................................................1

1.1.1 Ribosomal Protein Synthesis ...............................................................................1

1.1.2 Incorporation of Unnatural Amino Acids............................................................7

1.1.3 Engineering of Ribosome and Other Related Components ...............................11

1.1.4 Further Directions ..............................................................................................17

References ......................................................................................................................18

Chapter 2: Light Activation of Protein Splicing with a Photocaged Intein .......................23

2.1 Introduction ..............................................................................................................23

2.2 Materials and Methods .............................................................................................25

2.2.1 Materials ............................................................................................................25

2.2.2 Chemical Preparation of Photocaged Cysteines ................................................26

2.2.3 Plasmid Constructions .......................................................................................28

2.2.4 Mammalian Cell Culture and Transfection .......................................................33

2.2.5 Analysis of Intein-Mediated Splicing of mCherry ............................................34

2.2.6 Analysis of Intein-Mediated Splicing of Src .....................................................35

2.2.7 Photoactivation of Src and Fluorescence Microscopic Imaging .......................36

2.2.8 Mass Spectrometry Analysis of Proteins ...........................................................36

2.3 Results ......................................................................................................................36

2.4 Conclusions ..............................................................................................................48

References ......................................................................................................................49

Chapter 3: Expanding the Genetic Code for a Dinitrophenyl Hapten ...............................54

ix

3.1 Introduction ..............................................................................................................54

3.2 Materials and Methods .............................................................................................56

3.2.1 Chemical Synthesis of N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK, 3) .......56

3.2.2 Chemical Synthesis of N6-(2-(2-nitrophenyl)acetyl)lysine (2-NPK) and N6-(2-

(4-nitrophehyl)acetyl)lysine (4-NPK) ........................................................................58

3.2.3 Evolution of a Mutant Aminoacyl-tRNA Synthetase ........................................59

3.2.4 Computational Modeling of the DnpK/DnpKRS Complex Structure ...............60

3.2.5 Protein Expression and Purification from E. coli ..............................................60

3.2.6 Protein Expression and Purification from HEK293T Cells ..............................61

3.2.7 Protein Electrospray Mass Spectrometry ..........................................................62

3.2.8 Western Blotting ...............................................................................................62

3.3 Results ......................................................................................................................62

3.4 Conclusions ..............................................................................................................73

References ......................................................................................................................74

Chapter 4: Study of the Binding Energies between Unnatural Amino Acids and

Engineered Orthogonal Tyrosyl-tRNA Synthetases .........................................................79

4.1 Introduction ..............................................................................................................79

4.2 Methods ....................................................................................................................83

4.2.1 Preparation of aaRS-Amino Acid Complexes ...................................................83

4.2.2 Binding Energy Scoring with Autodock Vina and ROSETTA .........................85

4.2.3 Molecular Dynamics Simulations .....................................................................88

4.2.4 MM/PBSA Building Energy Calculation ..........................................................88

4.3 Results and Discussion .............................................................................................89

4.3.1 Selection and Preparation of aaRS-Amino Acid Complexes ............................89

4.3.2 Binding Energy Scoring with AutoDock Vina and ROSETTA ........................90

4.3.3 Binding Energy Estimation by MD-MM/PBSA or Direct MM/PBSA ............92

4.3.4 Binding Modes of aaRS-unAA Complexes .......................................................97

4.4 Conclusions ............................................................................................................100

References ........................................................................................................................101

Chapter 5: Summary ........................................................................................................109

x

LIST OF FIGURES

Figure 1.1 Chemical structures of pyrrolysine and selenocysteine. .....................................4

Figure 1.2 Biological pathways to synthesize selenocysteyl-tRNASec (Sec-tRNASec);

Schematic representation of the mechanism of encoding selenocysteine in mammalian

cells. .....................................................................................................................................6

Figure 1.3 Schematic diagram of genetic encoding of unnatural amino acids in living

cells. .....................................................................................................................................8

Figure 1.4 The competition between amber (TAG) codon suppression and RF-1 induced

translation termination. ......................................................................................................12

Figure 1.5 Protein synthesis in E. coli using a wild-type ribosome and an engineered

orthogonal ribosome. .........................................................................................................15

Figure 2.1 Plasmid map of pMAH2-CageCys. .................................................................29

Figure 2.2 X-ray crystal structure of mCherry (redrawn from PDB 2H5Q). ...................30

Figure 2.3 X-ray crystal structure of the human Src kinase catalytic domain (redrawn

from PDB 1FMK). .............................................................................................................31

Figure 2.4 Genetic encoding of photocaged cysteines in HEK 293T cells. ......................38

Figure 2.5 Photolysis of photocaged cysteines. .................................................................39

Figure 2.6 ESI mass spectrometry analysis of intact proteins. ..........................................40

Figure 2.7 Photoactivation of mCherry. ............................................................................42

Figure 2.8 Photoactivation of Src kinase. ..........................................................................46

Figure 2.9 Pseudocolored ratio FRET images of representative UVA-treated HEK 293T

cells harboring the F1 construct. ........................................................................................47

Figure 3.1 Applications of DNP-labeled proteins..............................................................55

Figure 3.2 Chemical Structure of N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK). .........64

Figure 3.3 Mass spectrometry analysis of the indicated proteins purified from DH10B or

the nfsA/nfsB double deletion strain, suggesting a reduced DNP group in these proteins.

............................................................................................................................................68

Figure 3.4 Mass spectrometry analysis of the indicated proteins purified from DH10B in

the presence of 2-NPK or 4-NPK. .....................................................................................69

Figure 3.5 Direct ESI-MS analysis (positive mode) of the lysate of DH10B cells

incubated with 1 mM DnpK. .............................................................................................70

Figure 3.6 SDS-PAGE and Western blot of DnpK-containing EGFP and the wild-type

EGFP, purified from HEK 293T cells. ..............................................................................70

Figure 3.7 Fluorescence imaging of HEK 293T cells containing genes for pEGFP-

Tyr39TAG, DnpKRS, and the corresponding suppressor tRNA, in the presence or

absence of DnpK (1 mM).. ................................................................................................71

xi

Figure 4.1 Chemical structures of natural and unnatural amino acids used in this study (1:

p-acetyl-L-phenylalanine, AcF; 2: 3-iodo-L-tyrosine, IoY; 3: p-iodo-L-phenylalanine,

IoF; and 4: L-tyrosine, Tyr). ..............................................................................................84

Figure 4.2 The RMSD values in the MD trajectories of the seven studied aaRS-amino

acid complexes. ..................................................................................................................87

Figure 4.3 The contributions of individual amino acid residues of aaRSes to the total

binding energies. ................................................................................................................96

Figure 4.4 MD-averaged structures showing the active sites of the studied aaRSes and

unAA complexes. ...............................................................................................................99

xii

LIST OF SCHEME

Scheme 2.1. Synthetic route to prepare photocaged cysteine. ...........................................26

Scheme 3.1. Synthetic route to prepare DnpK. ..................................................................56

Scheme 3.2. Synthetic route to prepare 2-NPK and 4-NPK. .............................................58

xiii

LIST OF TABLES

Table 4.1 Estimated binding free energies using AutoDock Vena and ROSETTA for the

seven tested aaRS-amino acid complexes. .........................................................................86

Table 4.2 Calculated binding energies using MD-MM/PBSA or direct MM/PBSA for the

seven aaRS-amino acid complexes. ...................................................................................95

1

Chapter 1: Introduction

1.1 Genetic Encoding of Unnatural Amino Acids

1.1.1 Ribosomal Protein Synthesis

Genetic information is mainly stored in cells as sequences of nucleotides.[1] Each nucleotide

is composed of a pentose (5-carbon carbohydrate), a phosphate group extending from 5’

(or 3’) position of the pentose, and one of four types of nucleobases. In

deoxyribonucleotides (DNA), 2-deoxyribose is the pentose, and adenine (A), guanine (G),

thymine (T) and cytosine (C) are the four types of bases. Prokaryotic and eukaryotic cells

use regions of DNA sequences as the templates to synthesize strands of ribonucleotides

(RNA). Sequences of RNA strands are copied from DNA strands, except that ribose

replaces 2-deoxyribose and uracil (U) replaces thymine as one of the four RNA bases. This

process is termed as “transcription[1]”. Next, Proteins are synthesized from transcribed

messenger RNAs (mRNA): every three bases in an mRNA open reading frame are

“translated” into a single amino acid residue. An important group of enzymes, aminoacyl

transfer RNA (tRNA) synthetases, catalyze the linkage between amino acids and tRNAs.

Every tRNA has a 3-base anticodon in its anticodon loop to pair with mRNA during

ribosomal protein synthesis.

Ribosomes are large RNA and protein containing machineries (up to several million Da),

catalyzing the formation of peptides from individual amino acids.[2] Ribosomes exist in all

archaeal, eubacterial and eukaryotic cells. Although differing in size and in detailed

2

composition, each ribosome has two subunits, a large subunit catalyzing peptidyl transfer

reaction and a small subunit critical for translation initiation.[2] Translation initiation factors

assemble the small subunit and the mRNA to start the formation of a translation complex.

The Shine-Dalgarno (SD) sequence of prokaryotic mRNA and 5’ cap of eukaryotic mRNA

are very important for the initiation.[3, 4] The nearby AUG codon is then identified by the

ribosome and decoded as an N-terminal N-formylmethionine (fMet) in prokaryotes or

methionine (Met) in eukaryotes. After the ribosome is fully assembled at the initiation

AUG site, it contains three RNA-binding sites, designated A, P and E sites. Elongation

starts when the fMet-tRNA (or Met-tRNA in eukaryotes) enters the P site, resulting in a

conformational change which opens the A site for another aminoacyl-tRNA to enter.

Peptide formation is catalyzed by the ribosomal RNA in the large subunit. After the bond

is formed, the A site contains a newly formed peptide, while the P site contains an

uncharged tRNA. The ribosome moves along the mRNA, so the uncharged tRNA enters

the E site and then exits from the ribosome. The peptidyl-tRNA enters the P site and opens

the A site for the next round of coupling. Elongation factors are needed in this process, for

example, to facilitate the entry of aminoacyl tRNA into the A site. When the ribosome

reaches one of the three termination codons (UAA, UAG and UGA), releasing factors

(proteins) would enter the A site and trigger the hydrolysis of the ester bond in peptidyl-

tRNA at the P site.[5] After releasing the peptide, the whole complex is disassembled with

the aid of several protein factors to recycle translation components. More detailed process

about ribosomal protein synthesis can be found in recently published review articles and in

other chapters of this book.[6-8]

http://en.wikipedia.org/wiki/Conformational_change

3

Under most circumstance, every three consecutive bases following the starting AUG codon

are translated into one amino acid. All four types of nucleobases can make 64 codons. With

three exceptions (UAA, UAG and UGA as stop codons), each codon encodes one of the

20 common natural amino acids. So there are degenerated codons: most of the 20 amino

acids are encoded by more than one codon. The correspondence between codons, and

amino acids and translational termination message, is nearly universal among all domains

of life.[9]

We here discuss a few exceptions. Mitochondrial ribosomes synthesize mitochondrial

proteins based on different codon tables.[10] Mitochondria carry their own genome, which

includes mitochondrial tRNAs. The mitochondrial genetic code has drifted from the

universal code. Furthermore, organisms including bacteria, yeast and other eukaryotes can

harbor suppressor tRNAs that can recognize and decode nonsense codons (UAA, UAG or

UGA).[11] These tRNAs were most likely derived from normal tRNAs through anticodon

mutations. New codon-anticodon interactions are established to read through stop codons.

In most natural cases, one of the 20 common natural amino acids is inserted in response to

stop codons.

It is quite unique to insert the unusual amino acid, pyrrolysine, in response to UAG

codons.[12, 13] Certain methanogenic archaea including Methanosarcina barkeri and M.

mazei, and the gram positive bacterium Desulfitobacterium hafniense, express amber

4

suppressor tRNAs (tRNAPyl) and synthetases that catalyze the charge of tRNAs with

pyrrolysine (Figure 1.1A). They also harbor gene clusters to biochemically synthesize the

amino acid pyrrolysine.[14, 15] The process to insert pyrrolysine is similar to the process for

ribosomal insertion of other amino acids: pyrrolysine-charged tRNAs are brought into

ribosomes by typical elongation factors to extend the nascent peptides.

Figure 1.1 Chemical structures of (A) pyrrolysine and (B) selenocysteine.

Another unusual amino acid, selenocysteine (Figure 1.1B), is also genetically encoded in

many natural organisms.[16, 17] Compared to cysteine, selenocysteine has a lower pKa and

a higher reduction potential, so diselenium bonds are more easily formed.[18]

Selenocysteine has been found to play a critical role for the function of a few anti-oxidant

proteins. Unlike other 20 natural amino acids and pyrrolysine, selenocysteine is not directly

charged to its tRNA (Figure 1.2A), because there is no free selenocysteine in cells.[19]

Instead, seryl-tRNA synthetase first links serine to a special selenocysteine tRNAs

(tRNASec). The resulting Ser-tRNASec is not recognized by translation factors, so are not

used for ribosomal translation. Next, the tRNA-bound seryl residue is converted to a

selenocysteine in the presence of appropriate enzymes and selenium donor molecules.[19]

Alternative translational elongation factors are needed to bring selenocysteine-charged

5

tRNASec (Sec-tRNASec) into ribosome for protein synthesis (Figure 1.2B).[17] The

anticodon of tRNASec is UCA, so it can pair with the UGA opal codon. Not all UGA

codons are suppressed, however. The mRNAs of selenocysteine-containing proteins

(selenoproteins) often contain sequences called SECIS (selenocysteine insertion sequence)

elements. The SECIS elements are defined by characteristic nucleotide sequences,

secondary structures and base-pairing patterns. In bacteria, SECIS elements are typically

located immediately after UGA codons in reading frames. In archaea and eukaryotes,

SECIS elements are in the 3’-UTRs (untranslated regions) of mRNAs, and can direct

multiple selenocysteines into a single peptide in response to multiple UGA codons (Figure

1.2B).[20] Sec-tRNASec specific elongation factors can bind SECIS elements, and promote

the delivery of Sec-tRNASec into ribosomes associated with the same mRNA. When cells

are grown in the presence of selenium, corresponding UGA codons are suppressed to

synthesize full-length functional selenoproteins.

6

Figure 1.2. (A) Biological pathways to synthesize selenocysteyl-tRNASec (Sec-

tRNASec). (B) Schematic representation of the mechanism of encoding

selenocysteine in mammalian cells.

Another related unusual case is ribosomal frameshifting during protein synthesis.[21]

Typically, proteins are synthesized based on a template mRNA with every three

consecutive nucleotides being read as an amino acid. However, frameshifting occurs at low

frequency: the ribosome slips by one base in either the 5’ (-1) or 3’ (+1) directions during

translation. Frameshifting is related to nucleotide sequence, secondary structure and

tertiary structure of an mRNA.

In the past decade, tremendous efforts have been put into investigation of molecular

mechanisms related to ribosomal protein synthesis. Atomic structures of individual

7

components involved in ribosomal protein synthesis have been elucidated. The 2009 Nobel

Prize in Chemistry has been awarded to Venkatraman Ramakrishnan, Thomas A. Steitz

and Ada E. Yonath for solving ribosome structure using X-ray crystallography.[6-8]

1.1.2 Incorporation of Unnatural Amino Acids

The work to understand how proteins are synthesized has been very fruitful. In the

meanwhile, researchers have developed methods to dramatically expand the repertoire of

amino acids used in protein synthesis.[22, 23] Orthogonal tRNAs and aminoacyl synthetases

have been engineered to encode unusual amino acids in response to nonsense codons and

4-base codons. Additional translational machinaries including ribosome and translation

factors have been mutated to increase the synthesis of unnatural proteins.[24, 25] Structurally

and functionally manipulated proteins have been utilized to study biology and develop new

therapeutics. Recent reviewers by us and others have summarized many details of this

technology.[22, 23] Interested readers should refer to those indicated references. Here we

only briefly describe the technology, link it with similar natural systems, focus on the re-

engineering of components other than tRNAs and synthetases, and finally highlight its

applications on therapeutics and vaccines.

Suppressor tRNAs for termination codons had been widely found in nature, so it was quite

straightforward to propose a similar method to incorporate unnatural amino acids.[11]

Initially, this was done in vitro using suppressor tRNAs pre-charged with unnatural amino

acids, and in vivo by directly injecting charged tRNAs.[26, 27] Those charged tRNA

8

molecules were made through either in vitro enzymatic reactions or methods that include

organic synthesis. Research by Schultz and others established a procedure to genetically

encode most components needed for incorporation of unusual amino acids (Figure 1.3).[28,

29] The technology is often referred to as “genetic code expansion”, and has been widely

adapted by the research community.

Figure 1.3. Schematic diagram of genetic encoding of unnatural amino acids in

living cells.

In a typical experiment, a pre-engineered orthogonal tRNA with its anticodon

complementary to a stop codon or a 4-based codon, and an also pre-engineered aminoacyl

tRNA synthetase with preference toward the unnatural amino acid, are recombinantly

expressed in cells. The unnatural amino acid is supplemented in the culture media. The

resulting cells are capable to link the amino acid with the suppressor tRNA and synthesize

modified proteins containing site-specifically inserted unnatural amino acids. This method

9

is compatible with living cells, so it has become an indispensable tool for life science

research. It is also an efficient and economical way to produce a large amount of nonnative

proteins. Currently, the technology is available for genetic encoding of more than 90

unnatural amino acids harboring various reactive conjugation handles, photoactive

functional groups, pre-installed post-translational modifications (PTMs), fluorophores,

metal-chelating functional groups and other useful side chains.[22, 23]

It is challenging to identify a pair of tRNA and synthetase orthogonal to cell endogenous

pathways, and engineer them to gain selective activity toward a novel unnatural amino

acid. In practice, orthogonal tRNA/synthetase pairs used in one organism are often derived

from another organism in a different domain of life. For example, the tyrosyl tRNA and

tyrosyl-tRNA synthetase pair from the archaeal Methanocaldococcus jannaschii

(MjTyrRS/MjtRNATyr) can be used in bacterial E. coli and Mycobacterium tuberculosis

(MTB), while pairs derived from the E. coli tyrosyl tRNA and synthetase

(EcTyrRS/EctRNATyr) have been used for genetic encoding of unnatural amino acids in

eukaryotic cells.[28, 29] Many other important pairs for eukaryotic uses are derived from the

E. coli leucyl tRNA and synthetase (EcLeuRS/EctRNALeu). In addition, pyrrolysyl tRNAs

and pyrrolysyl-tRNA synthetases (PylRS/tRNAPyl) from Methanosarcina barkeri and

Methanosarcina mazei, are orthogonal in both prokaryotic and eukaryotic organisms, and

have been engineered to encode many useful amino acids.[30]

10

The anticodons of these suppressors have been switched so that they can pair with nonsense

or 4-base codons. The first three bases of a 4-base codon need to be a less-used codon in

the target organism (the corresponding endogenous tRNA is less abundant). In addition,

wild-type synthetases have to be mutated to switch their substrate specificity from native

amino acids to unnatural amino acids. Usually, rounds of positive and negative selections

are performed. Briefly, synthetase libraries targeting at amino acid-binding residues are

created by molecular biology. Both the tRNA and the synthetase mutants are imported into

the organism cultured with media containing the supplemented unnatural amino acid. A

gene necessary for cell survival under the given selection condition is induced for

expression. However, nonsense or 4-base codons have been pre-inserted into its sequence.

Only if a synthetase mutant can charge the tRNA with the unnatural amino acid to suppress

nonsense or 4-base codons, cells would survive. Survivals from the positive selection will

be subjected to a negative selection step, in which a toxic gene containing nonsense or 4-

base codons will be expressed. No unnatural amino acid is provided in the negative

selection step. Cells containing any synthetase mutant charging the tRNA with cell

endogenous amino acids would be killed. The selection is often performed for multiple

cycles to enrich synthetase mutants selective for the corresponding unnatural amino

acid.[23]

1.1.3 Engineering of Ribosome and Other Related Components

Suppression of nonsense and four-base codons is not very efficient. Recombinantly

expressed and then charged orthogonal tRNAs has to compete with cell endogenous factors

11

(Figure 1.4), i.e. translation termination factors (peptide release factors) or charged

endogenous tRNAs that decode the first three bases of a four-base codon. Therefore, the

yield of full-length proteins containing unnatural amino acids is often low. This problem

is further amplified when multiple unusual codons are present in a single gene. Recent

work has attempted to solve the problem by targeting individual or multiple steps involved

in protein translation. For example, the interaction interface between the suppressor tRNA

derived from MjtRNATyr and the E. coli elongation factor Tu (EF-Tu) has been re-

engineered.[31] The improved tRNAs have been used to construct a series of pEvol plasmids

showing robust amber suppression efficiency in E. coli cells.[32] We and others are currently

performing similar work in yeast and mammalian cells to improved amber suppression in

eukaryotic systems. Besides tRNAs and synthetases, other machineries involved in protein

translation, such as ribosome and other translational factors, have also been targeted. The

purpose of those studies is to improve the efficiency of nonnative protein production,

and/or enable the incorporation of unusual amino acids whose encoding is otherwise

impossible.

12

Figure 1.4. The competition between amber (TAG) codon suppression and RF-1

induced translation termination.

Elongation factors are critical enzymes involved in protein synthesize. Suppressor tRNAs

carrying large nonnative amino acids are less tightly bound to elongation factor Tu (EF-

Tu) than natural amino acids. Sisido et al. re-engineered the EF-Tu binding pocket for

aminoacyl moieties of aminoacyl-tRNAs to increase its affinity toward large amino

acids.[33, 34] Several bulk aromatic amino acids, which are hardly or only slightly

incorporated by the wild-type EF-Tu, were successfully incorporated into proteins in the

presence of the EF-Tu mutants.

Bacterial release factors (RFs) 1 and 2 catalyze translation termination at either UAG and

UAA, or UAA and UGA, respectively (Figure 1.4). The large ribosomal subunit protein

L11 is a highly conserved protein containing two domains, an N-terminal domain (L11N)

and a C-terminal domain (L11C). L11 interacts with 23S rRNA and plays an important role

in the RF1-mediated peptide release. L11C alone can also bind 23S rRNA. The ribosome,

in which L11C is used to replace the full-length L11, shows translation efficiency

13

comparable to the wild-type ribosome, but has lower efficiency in the RF1-mediated

termination. Liu and his coworkers, therefore, overexpressed L11C in E. coli cell, to reduce

RF1-mediated translation termination and increase amber suppression efficiency.[35] They

demonstrated that three acetyllysine residues could be incorporated into a single peptide in

a reasonable yield.

Sakamoto, Yokoyama and their coworkers engineered an E. coli strain, which lacks RF1

to terminate translation in response to UAG codons.[36] A few genetic modifications were,

however, needed to circumvent the lethality of RF1 deletion. Several genes, which use

UAG as their stop codons, were mutated. In their mutated strain, UAG was able to be

assigned unambiguously to a natural or non-natural amino acid using different UAG-

decoding tRNAs. They also demonstrated that p-iodophenylalanine could be incorporated

in response to six in-frame amber codons in a model glutathione S-transferase (GST)

protein. Similarly, Wang et al. also reported several RF1-deletion E. coli strains.[37] They

found that R1 deletion could be tolerated by E. coli, as long as a certain version of RF2 is

express in cells.[38] They confirmed that the critical residue in RF2 is Ala246. These

reported E. coli strains are, undoubtedly, valuable tools for expression of proteins

containing multiple unnatural amino acids at different residue sites.

To incorporate multiple chemically distinct unnatural amino acids into a single protein,

mutually orthogonal pairs that are also compatible with cell endogenous tRNAs,

synthetases and amino acids are needed. First, Schultz and others reported the use of an

14

MjTyrRS/MjtRNATyr derived tRNA/synthetase pair and another pair derived from

Pyrococcus horikoshii lysyl tRNA and synthetase in response to UAG and AGGA codons,

respectively, for insertion of two different unnatural amino acids.[39] In addition, Liu et al.

used MjTyrRS/MjtRNATyr derived tRNA/synthetase pairs and PylRS/tRNAPyl derived

pairs in the same E. coli cells to decode two nonsense codons (UAG and UAA). Chin and

his coworkers, instead, reported the adaption of two orthogonal pairs directly from

MjTyrRS/MjtRNATyr, one pair responding to UAG and the other responding to

AGGA.[40] Direct use of two nonsense codons, or one nonsense and one four-base codon,

often leads to very low yield of protein production.

An exciting development is made by Chin and co-workers (Figure 1.5).[24] Orthogonal

ribosomes were particularly developed for encoding unnatural amino acids. Briefly, a 16S

rRNA library was built with mutations important for interactions at the ribosomal A site.

The library was screened to identify mutants exhibiting a substantial increase in efficiency

of decoding amber codons. Those mutant 16S rRNAs are likely to reduce the affinity

between RF-1 and ribosome, so peptide releasing in response to UAG codons is reduced.

Next, they engineered the ribosomal small subunit so that the mutated ribosome only binds

a mutated SD sequence. These derived ribosomes can only translate exogenously

introduced mRNAs, which harbor the mutated SD sequence. Endogenous mRNAs are

excluded from the mutant ribosome due to the disrupted translation initiation. In the

meanwhile, the synthesis of cell endogenous proteins is carried out by natural ribosomes.

More recently, Chin et al. further engineered an orthogonal ribosome for improved

15

efficiency in decoding 4-base codons.[25] They showed that the mutant ribosome

maintained its enhanced efficiency in decoding in-frame amber codons. Next, they used

this orthogonal ribosome to synthesize proteins containing two different unnatural amino

acids in response to both UAG and AGGA. One tRNA/synthetase pair was derived from

MjTyrRS/MjtRNATyr, and another pair was derived from PylRS/tRNAPyl. They were

able to generate a GST-calmodulin protein containing both azide and alkyne functional

groups. The protein was subjected to click chemistry to build an intramolecular bridge

through Cu(I)-catalyzed azide/alkyne Huisgen cycloaddition. The research represents an

interesting proof of concept that orthogonal ribosomes may be possibly re-engineered to

reassign triplet and quadruplet codons. Research toward this direction is likely to establish

biosynthetic pathways for polymers made with artificial building blocks.

Figure 1.5. Protein synthesis in E. coli using (A) a wild-type ribosome and (B) an

engineered orthogonal ribosome.

16

O-Phosphoserine (Sep) is an abundant posttranslational protein modification. Recently,

Söll and coworkers reported a method to synthesize homogenous Sep-containing proteins

in genetically modified E. coli.[41] Naturally, in some methanogenic archaea, there is no

cysteinyl-tRNA synthetase. Instead, a Sep specific synthetase (SepRS) catalyzes the

formation of the linkage between the amino acid O-phosphoserine and the corresponding

cysteinyl-tRNA (tRNACys). The O-phosphoserine charged tRNACys has low affinity with

EF-Tu. It is subsequently converted to cysteine by the enzyme SepCysS in the presence of

a sulfide donor. Next, Cys-tRNACys is used by ribosome for protein synthesis. Söll et al.

engineered a new amber suppressor from tRNACys by converting its anticodon to CUA

(pair with UAG). An additional C20U mutation was made to improve the aminoacylation

efficiency. It is worth noting that SepRS is not cross-reactive with any E. coli endogenous

tRNA and can be overexpressed in E. coli cells. E. coli has a Sep-compatible transporter,

so Sep was directly added to the growth medium. The E. coli endogenous phosphoserine

phosphatase gene, serB, was deleted to maintain adequate intracellular Sep concentration.

Furthermore, a new EF-Tu was engineered and recombinantly expressed to increase its

affinity. The engineered strain, which harbors a Sep-accepting transfer RNA, a cognate

Sep-tRNA synthetase (SepRS), and an engineered EF-Tu (EF-Sep), was successfully

utilized to synthesize the phosphorylated active form of human mitogen-activated ERK

activating kinase 1 (MEK1). This research has built a new avenue to biosynthesize

phosphoproteins for detailed studies of their biological properties.

17

To date, excluding tRNAs and synthetases, efforts to re-engineer protein synthesis-related

components have been limited to E. coli. It remains to be determined whether similar

strategies can be extended to eukaryotic (yeast and mammalian) cells and other industrial

microbial strains for applications in biotechnology and pharmaceuticals.

1.1.4 Future Directions

Biomolecular engineering of protein translation-related machinaries has now provided the

ability to genetically encoding more than 90 unnatural amino acids. The early research was

inspired directly by natural nonsense suppressors. Identification of orthogonal

tRNA/synthetase pairs, including tyrosyl-pairs and pyrrolysyl pairs, spurred the research

field. Further engineering on ribosome and translational factors improved and enhanced

the technology for better yields and broader applications. However, most engineering still

remains in E. coli cells. Further research is needed for yeast and mammalian cells, in which

incorporation efficiency of unnatural amino acids is much lower. In addition, further

demonstrations of using those unnatural amino acids haven’t been explored extensively.

Therefore, in this thesis, three different projects involving using photocaged unnatural

amino acids to manipulate living cell system, unnatural amino acid based new drug

development strategy and computational method for unnatural amino acid incorporation

would be presented. I hope all the three demonstrations would further broaden the ability

of this technology, which is expected to eventually help elucidate new biology and develop

new therapeutics and vaccines.

18

References:

[1] Crick F. Central Dogma of Molecular Biology. Nature.1970;227(5258):561-3.

[2] Ramakrishnan V. Ribosome Structure and the Mechanism of Translation. Cell.

2002;108(4):557-72.

[3] Chen H, Bjerknes M, Kumar R, Jay E. Determination of the optimal aligned spacing

between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia

coli mRNAs. Nucleic Acids Res. 1994 Nov 25;22(23):4953-7.

[4] Preiss T, Hentze MW. Dual function of the messenger RNA cap structure in

poly(A)-tail-promoted translation in yeast. Nature. 1998;392(6675):516-20.

[5] Frolova LY, Merkulova TI, Kisselev LL. Translation termination in eukaryotes:

polypeptide release factor eRF1 is composed of functionally and structurally distinct

domains. RNA. 2000;6(3):381-90.

[6] Korostelev A, Noller HF. The ribosome in focus: new structures bring new insights.

Trends Biochem. Sci. 2007;32(9):434-41.

[7] Berk V, Cate JH. Insights into protein biosynthesis from structures of bacterial

ribosomes. Curr. Opin. Struct. Biol. 2007;17(3):302-9.

[8] Schmeing TM, Ramakrishnan V. What recent ribosome structures have revealed

about the mechanism of translation. Nature. 2009;461(7268):1234-42.

[9] Jukes TH, Osawa S. Evolutionary changes in the genetic code. Comp. Biochem.

Physiol. B. 1993;106(3):489-94.

[10] Knight RD, Landweber LF, Yarus M. How mitochondria redefine the code. J. Mol.

Evol. 2001;53(4-5):299-313.

19

[11] Murgola EJ. tRNA, suppression, and the code. Annu. Rev. Genet. 1985;19:57-80.

[12] Srinivasan G, James CM, Krzycki JA. Pyrrolysine encoded by UAG in Archaea:

charging of a UAG-decoding specialized tRNA. Science. 2002;296(5572):1459-62.

[13] Hao B, Gong W, Ferguson TK, James CM, Krzycki JA, Chan MK. A new UAG-

encoded residue in the structure of a methanogen methyltransferase. Science.

2002;296(5572):1462-6.

[14] Gaston MA, Zhang L, Green-Church KB, Krzycki JA. The complete biosynthesis

of the genetically encoded amino acid pyrrolysine from lysine. Nature.

2011;471(7340):647-50.

[15] Cellitti SE, Ou W, Chiu H-P, Grunewald J, Jones DH, Hao X, et al. D-Ornithine

coopts pyrrolysine biosynthesis to make and insert pyrroline-carboxy-lysine. Nat. Chem.

Biol. 2011;7(8):528-30.

[16] Chambers I, Frampton J, Goldfarb P, Affara N, McBain W, Harrison PR. The

structure of the mouse glutathione peroxidase gene: the selenocysteine in the active site is

encoded by the 'termination' codon, TGA. EMBO J. 1986;5(6):1221-7.

[17] Bock A, Forchhammer K, Heider J, Leinfelder W, Sawers G, Veprek B, et al.

Selenocysteine: the 21st amino acid. Mol. Microbiol. 1991;5(3):515-20.

[18] Copeland PR. Making sense of nonsense: the evolution of selenocysteine usage in

proteins. Genome Biol. 2005;6(6):221.

[19] Yuan J, Palioura S, Salazar JC, Su D, O'Donoghue P, Hohn MJ, et al. RNA-

dependent conversion of phosphoserine forms selenocysteine in eukaryotes and archaea.

Proc. Natl. Acad. Sci. USA. 2006;103(50):18923-7.

20

[20] Berry MJ, Banu L, Harney JW, Larsen PR. Functional characterization of the

eukaryotic SECIS elements which direct selenocysteine insertion at UGA codons. EMBO

J. 1993;12(8):3315-22.

[21] Farabaugh PJ. Translational frameshifting: implications for the mechanism of

translational frame maintenance. Prog. Nucleic Acid Res. Mol. Biol. 2000;64:131-70.

[22] Ai HW. Biochemical analysis with the expanded genetic lexicon. Anal. Bioanal.

Chem. 2012;403(8):2089-102.

[23] Liu CC, Schultz PG. Adding new chemistries to the genetic code. Annu. Rev.

Biochem. 2010;79:413-44.

[24] Wang K, Neumann H, Peak-Chew SY, Chin JW. Evolved orthogonal ribosomes

enhance the efficiency of synthetic genetic code expansion. Nat. Biotechnol.

2007;25(7):770-7.

[25] Neumann H, Wang K, Davis L, Garcia-Alai M, Chin JW. Encoding multiple

unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature.

2010;464(7287):441-4.

[26] Shimizu Y, Inoue A, Tomari Y, Suzuki T, Yokogawa T, Nishikawa K, et al. Cell-

free translation reconstituted with purified components. Nat. Biotech. 2001;19(8):751-5.

[27] Saks ME, Sampson JR, Nowak MW, Kearney PC, Du F, Abelson JN, et al. An

engineered Tetrahymena tRNAGln for in vivo incorporation of unnatural amino acids into

proteins by nonsense suppression. J. Biol. Chem. 1996;271(38):23169-75.

[28] Wang L, Brock A, Herberich B, Schultz PG. Expanding the genetic code of

Escherichia coli. Science. 2001;292(5516):498-500.

21

[29] Chin JW, Cropp TA, Anderson JC, Mukherji M, Zhang Z, Schultz PG. An

Expanded Eukaryotic Genetic Code. Science. 2003;301(5635):964-7.

[30] Chen PR, Groff D, Guo J, Ou W, Cellitti S, Geierstanger BH, et al. A facile system

for encoding unnatural amino acids in mammalian cells. Angew. Chem. Int. Ed.

2009;48(22):4052-5.

[31] Guo J, Melancon CE, 3rd, Lee HS, Groff D, Schultz PG. Evolution of amber

suppressor tRNAs for efficient bacterial production of proteins containing nonnatural

amino acids. Angew. Chem. Int. Ed. 2009;48(48):9148-51.

[32] Young TS, Ahmad I, Yin JA, Schultz PG. An enhanced system for unnatural amino

acid mutagenesis in E. coli. J. Mol. Biol. 2010;395(2):361-74.

[33] Nakata H, Ohtsuki T, Abe R, Hohsaka T, Sisido M. Binding efficiency of

elongation factor Tu to tRNAs charged with nonnatural fluorescent amino acids. Anal.

Biochem. 2006;348(2):321-3.

[34] Doi Y, Ohtsuki T, Shimizu Y, Ueda T, Sisido M. Elongation factor Tu mutants

expand amino acid tolerance of protein biosynthesis system. J. Am. Chem. Soc.

2007;129(46):14458-62.

[35] Huang Y, Russell WK, Wan W, Pai PJ, Russell DH, Liu W. A convenient method

for genetic incorporation of multiple noncanonical amino acids into one protein in

Escherichia coli. Mol. Biosyst. 2010 Apr;6(4):683-6.

[36] Mukai T, Hayashi A, Iraha F, Sato A, Ohtake K, Yokoyama S, et al. Codon

reassignment in the Escherichia coli genetic code. Nucleic Acids. Res. 2010;38(22):8188-

95.

22

[37] Johnson DB, Xu J, Shen Z, Takimoto JK, Schultz MD, Schmitz RJ, et al. RF1

knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nat.

Chem. Biol. 2011;7(11):779-86.

[38] Johnson DB, Wang C, Xu J, Schultz MD, Schmitz RJ, Ecker JR, et al. Release

Factor One Is Nonessential in Escherichia coli. ACS Chem. Biol. 2012;7(8):1337-44.

[39] Anderson JC, Wu N, Santoro SW, Lakshman V, King DS, Schultz PG. An

expanded genetic code with a functional quadruplet codon. Proc. Natl. Acad. Sci. USA.

2004;101(20):7566-71.

[40] Neumann H, Slusarczyk AL, Chin JW. De novo generation of mutually orthogonal

aminoacyl-tRNA synthetase/tRNA pairs. J. Am. Chem. Soc. 2010;132(7):2142-4.

[41] Park HS, Hohn MJ, Umehara T, Guo LT, Osborne EM, Benner J, et al. Expanding

the genetic code of Escherichia coli with phosphoserine. Science. 2011;333(6046):1151-4.

23

Chapter 2: Light Activation of Protein

Splicing with a Photocaged Intein

2.1 Introduction

Inteins are protein elements that are capable of excising themselves and subsequently

splicing adjacent N- and C-terminal extein flanks to form a new truncated peptide.[1] These

naturally occurring, self-catalyzing protein-splicing elements have been adapted to achieve

efficient protein purification, ligation, labeling, cyclization, cleavage, and patterning.[2, 3]

In particular, conditional inteins, whose activities are inducible by additional factors, such

as small molecules, light, or changes in temperature, pH, or redox states, have previously

been utilized to regulate protein activities in vitro and in vivo.[4, 5] Photoactivatable inteins

are of particular interest because light-based approaches often have sufficient spatial and

temporal resolution to meet the need of understanding biology at the cellular and

subcellular levels.[6] In a previous work, Noren et al. reported the in vitro preparation of a

photoactivatable Thermococcus litoralis (Tli) Pol-2 intein, using a chemically amino-

acylated suppressor tRNA.[7] Furthermore, chemical synthetic methods have also been

employed to integrate photo-cleavable functional groups into the O-acyl isomer,[8] the

peptide backbone,[9] or the N-terminus[10] of split inteins to achieve photo-controlled

protein splicing. Due to the difficulty of directly delivering proteins or peptides into living

cells, these studies focused on in vitro applications. In another work, two photo-responsive

dimerization domains were each fused to an artificially split intein fragment as a genetically

24

encoded system to control protein splicing in living Saccharomyces cerevisiae cells, but

the system was not adaptable to mammalian cells.[11] Herein, we report the genetic

encoding of a photoactivatable intein and its applications in directly controlling primary

structures of proteins and therefore their functions, in living mammalian cells.

The Nostoc punctiforme (Npu) DnaE intein is among the most well-characterized and

efficient inteins, with a splicing reaction half-life of ∼60 s at 37 °C.[12, 13] The Npu DnaE

intein is also compatible with a myriad of flanking extein sequences.[14] All these features

make the Npu DnaE intein an ideal research tool, especially for mammalian studies.

Mutagenesis of the first catalytic cysteine residue within the Npu DnaE intein to alanine

(Cys/Ala) abrogates protein splicing and auto-cleavage at both intein domain ends.[12, 15]

This property is different from that of some other recently reported fast inteins, whose

Cys/Ala mutants are efficient in undergoing the C-terminal cleavage reaction.[16]

The genetic code expansion technology is capable of introducing site-specific photocaged

lysine, tyrosine, serine, and cysteine residues into proteins of interest in living systems,

including bacterial, yeast, and mammalian cells.[17-21] Previously, optical control of

enzymatic activities[22-24], ion channels[25], gene expression and silencing[26], and protein

translocation[27, 28] have been demonstrated by replacing critical protein residues with

photocaged unnatural amino acids (UAAs). In this study, we show that a genetically

encoded photoactivatable intein can be readily derived by replacing the Cys1 residue of

Npu DnaE intein with a photocaged cysteine, and it is highly effective in directly

25

modulating primary protein structures, thereby rendering a general approach for

controlling protein activities in living cells.

2.2 Materials and Methods

2.2.1 Materials

All chemicals were purchases from Sigma-Aldrich (St. Louis, MO) or Alfa Aesa (Ward

Hill, MA). Synthetic DNA oligonucleotides were purchased from Integrated DNA

Technologies (IDT; San Diego, CA). Restriction endonucleases were purchased from New

England Biolabs (Ipswich, MA) or Thermo Fisher Scientific Fermentas (Vilnius,

Lithuania). PCR and restriction digest products were purified by gel electrophoresis and

extracted using the Syd Labs Gel Extraction kit (Malden, MA). Syd Labs Mini-prep kit

was used for plasmid purification. DNA sequence analysis was performed by the Genomics

Core at the University of California, Riverside (UCR; Riverside, California). Protein mass

spectrometry was performed at the UCR High Resolution Mass Spectrometry Facility.

Plasmids encoding the Npu DnaE intein (Addgene # 41684) and Src (Addgene # 23934)

were purchased from Addgene (Cambridge, MA). The Src kinase sensor was a gift from

Prof. Yingxiao Wang at the University of California, San Diego (San Diego, California).

26

2.2.2 Chemical Preparation of Photocaged Cysteines

Scheme 2.1. Synthetic route to prepare photocaged cysteine (2).

2.2.2.1 Chemical Preparation of (R,S) 1-(1-Bromoethyl)-4,5-

dimethoxy-2-nitrobenzene (6)

Compound 4 (900 mg, 4 mmol) in scheme 1 prepared from compound 3 according to the

literature, was dissolved in THF/EtOH (1:1,15 mL) at room temperature; followed by

intermittent addition of NaBH4 (152 mg, 4 mmol) over 20 min. After stirring the reaction

mixture for another 3 hour, diluted HCl (1 mol/L, 4 mL) was added to neutralize excess

NaBH4. The solvent was then removed in vacuo, and H2O (10 mL) was subsequently

added to the residue. The mixture was extracted three times with CH2Cl2 (10 mL). The

combined organic layer was dried over anhydrous Na2SO4 and further concentrated to

27

afford crude 5 as a yellow solid, which was then used directly without further purification.

Compound 5 dissolved in CH2Cl2 (20 mL) was cooled in ice bath. PBr3 (475 µL, 5 mmol)

was introduced dropwise. The reaction mixture was stirred for another 3 hour before

saturated NaHCO3 aqueous solution (15 mL) was added. The organic layer was separated,

washed twice with H2O (10 mL), and further dried over anhydrous Na2SO4. The solvent

was removed in vacuo to afford crude compound 6 as yellow oil. The crude product was

purified by silica chromatography (EtOAc/Hexane 1:4) to obtain pure compound 6 as

yellow oil (810 mg, 2.79 mmol). The yield was 69% over two steps.

2.2.2.2 Chemical Preparation of N-(tert-butoxycarbonyl)-S-[(R,S)-

1-{4',5'-dimethoxy-2'-nitrophenyl}ethyl]- L-cysteine (7)

L-Cysteine (0.36 g, 3 mmol) was dissolved in 5 mL of deionized water and then

neutralized by triethylamine (405 µL, 2.8 mmol). The solution was cooled in ice/water

bath. Next, compound 6 (2.79 mmol in 5 mL of methanol) was added dropwise over 15

min. The reaction mixture was stirred overnight. The yellow precipitation was collected.

The filtrate was washed twice with CH2Cl2 (10 mL). The aqueous layer and the yellow

precipitation were combined followed by addition of saturated NaHCO3 aqueous solution

(2 mL) and (Boc)2O (654 mg, 3 mmol). The reaction mixture was allowed to stir for

another 3 hour. Next, it was acidified with HCl (1 mol/L, 5 mL) and extracted with CH2Cl2

(10 mL) three times. The organic layer was combined and dried over anhydrous Na2SO4.

The solvent was removed in vacuo to yield crude compound 7 as yellow oil. The crude

28

product was purified by silica chromatography (EtOAc/Hexane 2:1) to obtain pure

compound 7 as yellow oil (620 mg, 1.44 mmol). The yield was 52%.

2.2.2.3 Chemical Preparation of S-[(R,S)-1-{4',5'-Dimethoxy-2'-

nitrophenyl}ethyl]-L-cysteine (2)

Compound 7 (142 mg, 0.33 mmol) was dissolved in dioxane (3 mL), and next,

concentrated HCl (1 mL) was introduced. The solution was stirred for 2 hour at room

temperature. The solvent was removed in vacuo to afford compound 7 quantitatively as a

yellow solid.

2.2.3 Plasmid Constructions

In order to achieve the genetic encoding of photocaged cysteines, a plasmid

pMAH2CagCys was constructed for the mammalian expression of the corresponding

tRNA and aminoacyl-tRNA synthetase. The gene encoding the aminoacyl-tRNA

synthetase (E. coli leucyl-tRNA synthetase with M40G, L41Q, Y499L, Y527G, H537F

mutations) was codon-optimized for mammalian expression and chemically synthesized

by IDT. The gene fragment encoding an H1 promoter and the tRNA was also chemically

synthesized. One copy of the synthetase gene was amplified with oligonucleotides

CAGCYS-F and CAGCYS-R, digested with Hind III and Apa I, and inserted into a

previously reported pMAH plasmid. A successful clone identified by DNA sequencing

served as the PCR template in a reaction using oligonucleotides pMAH-tRNA1-F and

pMAH-tRNA2-R. The PCR reaction amplified the whole plasmid and appended Spe I and

29

Xho I restriction sites to the ends of the DNA product. Next, the gene fragment encoding

the H1 promoter and the tRNA was amplified by oligonucleotides tRNA-F and tRNA-R.

tRNA-F and tRNA-R installed Spe I and Sal I restriction sites to the ends of the DNA

product. The above two DNA fragments were digested with Spe I and Xho I, and Spe I and

Sal I, respectively. Since Xho I and Sal I generate compatible ends, the above two

fragments were ligated to afford a complete plasmid. An additional Xho I site was designed

upstream to the H1 promoter. Thus, the resulting plasmid was able to be re-digested with

Spe I and Xho I to insert the second H1-tRNA fragment. This procedure was repeated to

generate a pMAH2-CageCys plasmid containing 3 copies of H1-tRNA and 1 copy of the

synthetase.

Figure 2.1. Plasmid map of pMAH2-CageCys

30

To construct the intein/mCherry fusion, oligonucleotides IC1 and IC2 were used to amplify

the N-terminal portion of mCherry. IC3 and IC4 were used to amplify the Npu DnaE intein

from the plasmid pSKDuet16 (Addgene # 41684) and mutate the codon of Cys1 to TAG.

IC5 and IC6 were used to amplify the C-terminal portion of mCherry. The three pieces

were fused together by overlap extension PCR using IC1 and IC6. The product was

digested with Hind III and Xho I and inserted into a pre-digested compatible pcDNA3

plasmid.

Figure 2.2. (a) X-ray crystal structure of mCherry (redrawn from PDB 2H5Q). The

chromophore (magenta) and residues 138 and 139 are shown as ball

representations. (b) The primary sequence of the photocaged intein/mCherry

chimeric protein. The asterisk (*) represents the UAA 2 incorporation site. The

photo-activated protein splicing product is expected to be mCherry, containing two

mutations at residues 138 and 139.

31

To construct the intein/Src fusions, a similar overlap extension PCR strategy was utilized.

The three fused DNA fragments were digested with Hind III and EcoR I and inserted into

a pre-digested compatible pcDNA3 plasmid. In addition, the full-length mCherry was

amplified with oligonucleotides ECORI-RFP-F and IC6, treated with appropriate

restriction enzymes, and inserted between EcoR I and Xho I restriction sites of the

pcDNA3-derived plasmids. Constructed plasmids were confirmed by DNA sequencing.

Figure 2.3. (a) X-ray crystal structure of the human Src kinase catalytic domain

(redrawn from PDB 1FMK). Residues 277, 342 and 400 are shown as ball

representations. (b) The primary sequence of the Src kinase catalytic domain fused

to mCherry. Residues 277, 342 and 400 are colored in magenta. The photocaged

intein was inserted upstream of these residues. Ser342 was mutated to cysteine,

since the Npu DnaE intein requires a +1 site cysteine for efficient protein splicing.

32

Oligonucleotides used for plasmids construction are listed below:

CAGCYS-F: CACATGAAGCTTGCCACCATGCAAG

CAGCYS-R: TAATATGGGCCCTTAGCCCACGAC

pMAH-tRNA1-F: TTATTGACTAGTTATTAATAGTAATCAATTACGGGGTC \

pMAH-tRNA2-R: ATAACTCGAGTCGGGGAAATGTGC

tRNA-F: GCCATCACTAGTCAATAATCAATGC

tRNA-R: ACTCGTGTCGACCTCGACTCAAAAAAAGGACTACCCGGAGCGGGA

IC1: TACTAAGCTTGCCACCATGGTGAGCAAGGGCGAG

IC2: ATAGCTTAACTACTGCATTACGGGGCCGTCGGA

IC3: GTAATGCAGTAGTTAAGCTATGAAACGGAAATA

IC4: GGTCATACAATTAGAAGCTATGAAGCCATT

IC5: ATAGCTTCTAATTGTATGACCATGGGCTGGGAGGCC

IC6: ATTCCTCGAGTTAATGGTGGTGATGGTGGTGCTTGTACAGCTCGTCCAT

SRC-F: CTGTAAGCTTGCCACCATGTCCAAACACGCCGATGGCCTG

IS-1-1-F: GTCAAGCTGGGCCAGGGCTAGTTAAGCTATGAAACGGAA

IS-1-1-R: TTCCGTTTCATAGCTTAACTAGCCCTGGCCCAGCTTGAC

IS-1-2-F: TTCATAGCTTCTAATTGCTTTGGCGAGGTGTGG

IS-1-2-R: CCACACCTCGCCAAAGCAATTAGAAGCTATGAA

IS-2-1-F: ATCGTCACGGAGTACATGTAGTTAAGCTATGAAACGGAA

IS-2-1-R: TTCCGTTTCATAGCTTAACTACATGTACTCCGTGACGAT

IS-2-2-F: TTCATAGCTTCTAATTGCAAGGGGAGTTTGCTGGAC

33

IS-2-2-R: GTCCAGCAAACTCCCCTTGCAATTAGAAGCTATGAA

IS-3-1-F: GTGGGAGAGAACCTGGTGTAGTTAAGCTATGAAACGGAA

IS-3-1-R: TTCCGTTTCATAGCTTAACTACACCAGGTTCTCTCCCAC

IS-3-2-F: TTCATAGCTTCTAATTGCAAAGTGGCCGACTTT

IS-3-2-R: AAAGTCGGCCACTTTGCAATTAGAAGCTATGAA

SRC-R: TTTTGAATTCGAGGTTCTCCCCGGGCTGGTACTG

ECORI-RFP-F: ATAAGAATTCGTGAGCAAGGGCGAGGAGGAT

2.2.4 Mammalian Cell Culture and Transfection

HEK 293T cells were maintained in T25 flasks with 5 mL Dulbecco’s Modified Eagle’s

Medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and incubated at

37°C with 5% CO2 in humidified air. Cells at 80% confluence were passaged into 35-mm

or 100-mm culture dishes in a ratio of 1:10 or 1:20 for following transfection. In the next

day, transfection complexes were prepared by mixing DNA and PEI (polyethylenimine,

linear, 25 kD) (DNA:PEI (w/w) = 1:2.5) in Opti-MEM. For 35-mm culture dish, 10 µL PEI

(1 µg/µL) was used to prepare 500 µL transfection media. For 100-mm culture dishes, 60

µL PEI (1 µg/µL) was added to 2 mL Opti-MEM. To express the intein/mCherry fusion,

pcDNA3 and pMAH2-CagCys were used in a 1:1 ratio. To express intein/Src fusions,

pcDNA3, pMAH2-CagCys and the KRas Src sensor were used in a 1:1:0.25 ratio. After

preparing transfection complexes, cells were soaked with transfection media for 2 hours.

Next, pre-warmed fresh culture media were added to replace the transfection media. For

34

positive samples, all transfected cells were cultured in media containing 1 mM of the

photocaged cysteine 2, while no UAA was used for negative control samples.

2.2.5 Analysis of Intein-Mediated Splicing of mCherry

After transfection, cells were cultured for another 4 days. Fresh media were added every 2

days. After removing the culture media, cells in culture dishes sitting on ice were directly

illuminated with UVA light (365 nm radiation of 600 µW/cm2, Black Ray Lamp, Model

XX-20BLB, VWR, cat. no. 21474-676) for 10 min. Cells were left in dark in DMEM

containing 10% FBS at 37°C for 1 hour for protein splicing and mCherry chromophore

maturation. Cells were imaged under a Leica SP5 confocal fluorescence microscope. The

excitation laser was set at 488 nm, and emission was collected from 500 nm to 550 nm.

To analyze proteins with SDS-PAGE, cells were collected and lysed in RIPA (radio-

immunoprecipitation assay) buffer directly after the 10-min irradiation. The mixtures were

sonicated for 5 seconds. Cell lysates were centrifuged at 13,000xg for 5 min at 4°C. The

supernatants were collected for 6xHis-tagged protein purification. Ni-NTA agarose

(Qiagen) was used, according to the protocol provided by the manufacturer for native

conditions. The components of Wash Buffer are 30 mM imidazole, 150 mM NaCl and 50

mM NaH2PO4 with pH adjusted to 8. The components of Elution Buffer are 300 mM

imidazole, 150 mM NaCl and 50 mM NaH2PO4 with pH adjusted to 8. Purified proteins

were analyzed on a 15% SDS-PAGE gel. The control protein sample was prepared in

35

parallel from the same amount of cells that were equally treated except for no UV

irradiation.

2.2.6 Analysis of Intein-Mediated Splicing of Src

After transfection, cells were cultured for 4 days. Fresh media were added every 2 days.

After removing the culture media, cells in culture dishes sitting on ice were directly

illuminated with UVA light (365 nm radiation of 600 µW/cm2, Black Ray Lamp, Model

XX20BLB, VWR, cat. no. 21474-676) for 10 min. Cells were collected and lysed

immediately in RIPA buffer. The mixtures were sonicated for 5 seconds. Cell lysates were

centrifuged at 13,000xg for 5 min at 4oC. The supernatants were directly used for

fluorescence measurements. A mono-chromator-based Synergy Mx Microplate Reader

(BioTek, Winooski, VT) was used to record all spectra. To record the fluorescence

emission spectra, the excitation wavelength was set at 430 nm, and the emission scanned

from 450 nm to 600 nm. The Förster resonance energy transfer (FRET) ratio was calculated

by dividing the emission at 530 nm by the emission at 480 nm.

To inhibit protein synthesis during and after UV illumination in our control experiments,

cycloheximide (100 µg/ml) was added into cell culture media 1 h before the light treatment,

and also into the RIPA buffer. Cells were otherwise treated identically, and the same

experimental procedure was used to quantitatively measure fluorescence ratios.

36

2.2.7 Photoactivation of Src and Fluorescence Microscopic Imaging

After transfection, cells were cultured for 3 days. Before imaging, the cells were switched

into Dulbecco’s Phosphate Buffered Saline (DPBS) containing 1 mM Ca2+ and 1 mM

Mg2+. The experiments were done with a Motic AE31 inverted epi-fluorescence

microscopy with home-built FRET imaging ability. Photoactivation was carried out with a

DAPI excitation filter (377 nm/50 nm, Iridian Part # FEX000003). Regions of interest were

illuminated for 2 min (~ 4 mW/cm2). Next, time-lapse imaging was performed for 30 min.

The excitation filter was 436 nm/20 nm. The emission filters were 480 nm/40 nm and 535

nm/50 nm. The imaging results were analyzed using ImageJ according to a protocol

published previously.

2.2.8 Mass Spectrometry Analysis of Proteins

Proteins (40 µg) were precipitated in methanol/chloroform. The pellet was dissolved in

acetonitrile and ddH2O (1:1) mixture (30 µL) containing 1% formic acid. A direct infusion

mode was used to record mass spectra on an Agilent ESI-TOF instrument at the Analytical

Chemistry Instrumentation Facility of UCR.

2.3 Results

Previous efforts have utilized mutant pairs of pyrrolysyl tRNA synthetase

(PylRS)/tRNA[29, 30] and Escherichia coli leucyl tRNA synthetase (EcLeuRS)/tRNA[25] in

mammalian cells for the genetic encoding of unnatural cysteine derivatives that can be

decaged with long-wavelength UVA radiation. In particularly, an orthogonal

37

EcLeuRS/tRNA pair originally engineered for the encoding of a photocaged serine in

yeast[19] was found to be capable of encoding a photocaged cysteine (1 in Figure 2.4a) in

mammalian cells.[25] Based on these results, we modified our pMAH mammalian

expression plasmid[31] to express the mutant EcLeuRS and tRNA genes. Expression of the

full-length GFP protein in Human Embryonic Kidney (HEK) 293T cells bearing EGFP-

Tyr39TAG (a gene for enhanced green fluorescent protein with an amber codon at residue

39) was observed to be dependent on 1 (Figure 2.4b). Photolysis of 1 is expected to generate

an aldehyde byproduct, which may further react with free cellular amines to inadvertently

promote cell toxicity (Figure 2.5a).[32] Therefore, we also prepared a new UAA, 2 (Figure

2.4a), photolysis of which yields a cysteine and a less reactive ketone byproduct (Figure

2.5bc). Since 2 is structurally similar to 1, we also tested 2 for amber suppression in the

presence of the mutant EcLeuRS/tRNA pair. We achieved an appreciable yield of full-

length GFP from HEK 293T cells, as observed by SDS-PAGE analysis and fluorescence

microscopic imaging (Figure 2.4b and c). Electrospray ionization mass spectrometry (ESI-

MS) further confirmed the genetic incorporation of 2 in the re-combinantly expressed

EGFP (Figure 2.6).

38

Figure 2.4. Genetic encoding of photocaged cysteines in HEK 293T cells. (a)

Chemical structures of two photocaged cysteines, 1 and 2. (b) SDS-PAGE analysis

of Ni-NTA-purified EGFP, containing 1 or 2, expressed in HEK 293T cells. (c)

Microscopic imaging of EGFP expressing HEK 293T cells in the absence (left

column) or presence (right column) of 2 (scale bar: 50 μm).

39

Figure 2.5. Photolysis of photocaged cysteines, 1 and 2, yields a cysteine and either

(a) an aldehyde, or (b) a ketone by-product. (c) Electrospray ionization (ESI) mass

spectrum of 2 briefly exposed to long-wavelength UVA light, showing the formation

of a ketone byproduct.

40

Figure 2.6. ESI mass spectrometry analysis of intact proteins. (a) Mass spectrum of

EGFP, containing 1 at residue 39 (calculated mass: 28817, observed mass: 29818).

(b) Mass spectrum of EGFP containing 2 at residue 39 (calculated mass: 28831,

observed mass: 29832). The differences between the observed and calculated masses

are within the expected error range of the instrument.

To determine whether 2 can be utilized to photocontrol the protein splicing activity of the

Npu DnaE intein, we inserted a full-length Npu DnaE intein sequence into mCherry (Figure

2.7a). The residue 138 on a long loop between the β-strands 6 and 7 of mCherry was chosen

as the insertion site (Figure 2.2).[33] Moreover, the codon of the Cys1 residue of Npu DnaE

intein was mutated to an amber codon (TAG) for UAA incorporation. The chimeric

41

construct was subsequently expressed in HEK 293T cells, with cell culture media

containing 2. Almost no fluorescence was observed prior to UVA treatment (Figure 2.7b),

suggesting that the intein insertion disrupted the fluorescence of mCherry. Next, we used

a UVA lamp to directly illuminate cells in cell culture dishes, and strong red fluorescence

was observed in 1 h after irradiation (Figure 2.7b). This rate of developing red fluorescence

in cells was comparable to the rate of chromophore maturation of mCherry.[34] These

results indicate that the caged intein was photoactivated to undergo protein splicing and

form a highly fluorescent reconstituted mCherry. Since the construct was 6xHis-tagged at

the C-terminal end, Ni-NTA agarose beads were utilized to purify proteins from untreated

or UVA-treated cells. SDS-PAGE analysis of the proteins confirmed the highly efficient,

light-induced protein splicing: upon UVA-treatment, nearly all of the chimeric protein was

converted to the spliced product (Figure 2.7c).

42

Figure 2.7. Photoactivation of mCherry. (a) Primary structures of the

intein/mCherry chimeric protein and its photo-converted product after UV-induced

protein splicing. The red portion of the bar represents the mCherry sequence. The

asterisk (*) represents the Cys1 residue for UAA incorporation. The “CM” region

are two extein residues (+1 and +2). (b) Microscopic imaging of HEK 293T cells

expressing the construct treated with or without UV irradiation (scale bar: 50 μm).

(c) SDS-PAGE analysis of the Ni-NTA-purified proteins from HEK 293T cells, with

or without UV irradiation.

We next explored the use the photocaged intein in controlling enzymatic activities. We

inserted the photocaged intein into the catalytic domain of Src, a human tyrosine kinase.

The kinase catalytic domain has eight cysteine residues and 12 serine residues. We

designed chimeric proteins by randomly and individually inserting the intein into three sites

in Src (Figure 2.8a and Figure 2.3). First, we inserted the intein between Gly276 and

Cys277, or Val399 and Cys400 of Src (F1 and F2 in Figure 2.8a). For these two constructs,

protein splicing is expected to generate a product identical to the wildtype Src kinase

43

catalytic domain. We also built the third construct, F3, in which the intein was placed

downstream of Met341 (Figure 2.8a). Because the Npu DnaE intein requires a cysteine

residue at the +1 site for efficient protein splicing,[12] we also mutated Ser342 to cysteine,

to which appended was the native Src sequence from residue 343 to residue 533. The

splicing product of F3 is expected to be different from the wild-type protein by a single

Ser342Cys mutation. It is worth noting that a serine-to-cysteine mutant is tolerated in many

cases without dramatically affecting protein activities.[36] We also fused mCherry at the C-

terminal end as an expression indicator of the UAA-containing full-length proteins. Next,

we used a KRas-Src sensor,[37] based on Forster resonance energy transfer (FRET) between

ECFP and YPet, to evaluate the activities of F1, F2, and F3 in the presence or absence of

UVA irradiation. This sensor was well-validated in previous studies, and Src kinase

activity is known to decrease the intensity ratio (YPet/ECFP) of the sensitized YPet

fluorescence emission to the direct ECFP donor emission.[37] HEK 293T cells containing

each of the 3 constructs and the Ras-Src sensor were treated with UVA light and, then,

lysed for fluorescence quantification with a plate reader (Figure 2.8b). All of our three

constructs were inactive prior to UVA irradiation, while UVA light was able to activate

them, leading to the decrease of the FRET ratios of the sensor. A reduced FRET ratio was

also observed for cells co-expressing a wild-type Src kinase and the Src sensor.

Furthermore, negative control experiments were performed with HEK 293T cells

containing each of the three constructs but cultured in the absence of 2. Cells in the negative

groups were also subjected to the identical UVA treatment, so that the partial

photobleaching of the Src sensor did not mask the FRET changes caused by the

44

photoactivation of the Src kinase activity. Moreover, we utilized fluorescence microscopy

to closely monitor the process (Figure 2.8c). HEK 293T cells coexpressing the Src sensor

and the chimeric F1 construct were irradiated on an epi-fluorescence microscope equipped

with a DAPI excitation filter. Next, we carried out time-lapse, two-channel FRET imaging

of ECFP and YPet. The FRET ratios of the Src sensor gradually decreased in the monitored

30 min period. In contrast, the UVA-treated control cells cultured in the absence of 2

showed no obvious change in FRET ratios during the imaging period (Figure 2.8d and

Figure 2.9). It was noted that considerable Src-induced FRET changes occurred during the

2 min of UVA illumination. Analysis of single cells showed that the average FRET ratio

(YPet/ECFP) at 0 min, when time-lapse FRET imaging started, was 2.11 ± 0.08 for cells

containing the photo-activated Src. In comparison, negative cells identically treated with

UVA radiation had an average FRET ratio of 2.35 ± 0.03. This is not surprising,

considering the fast kinetics of the Npu DnaE intein. The UVA illumination condition did

not affect cell viability[38] but effectively activated the photocaged intein to promote the

formation of Src via protein splicing. These data support that the photocaged Npu DnaE

intein is an effective tool for the control of enzyme activities.

UV radiation may also decage the charged unnatural aminoacyl tRNA, which may be

further utilized by cellular ribosomes to synthesize proteins. We added cycloheximide (100

μg/mL) to block ribosomal protein synthesis during and after irradiation, the

photoactivation of Src kinase was not affected (Figure 2.8b). In addition, the activation of

Src was observed right after UV irradiation (Figure 2.8d), when ribosomal protein

45

synthesis from the decaged aminoacyl tRNA was unlikely to be achieved in this short time

frame. These results suggest that the direct decaging of the accumulated chimeric proteins

in cells was the major pathway in our experiments.

46

Figure 2.8. Photoactivation of Src kinase. (a) Primary structures of the chimeric

proteins tested in this study. The gray portion of the bars represents the sequence of

the human Src kinase between the indicated residues. The asterisk (*) indicates the

Cys1 residue for UAA incorporation; “M” is methionine, as the translational start

site; and “C” is cysteine, used to replace residue 342 of Src. (b) Activity of the

chimeric proteins before and after UVA irradiation, as measured from FRET ratios

of a KRas-Src sensor. In the absence of 2, the full length proteins were not

synthesized and are thus used as negative controls. A wild-type Src was also

prepared as a positive control. To block ribosomal protein synthesis during and

after UVA irradiation, cycloheximide (CHX) was also added to a control group. (c)

Pseudo-colored ratio images of representative UVA-treated HEK 293T cells

expressing the F1 construct in the presence of 2 at the indicated post-treatment time

(in minutes). The color bar represents fluorescence ratio (YPet/ECFP) (scale bar:

25 μm). (d) FRET ratios plotted versus time for HEK 293T cells. Color symbols are

for individual cells in panel c, marked at 0 min by arrows in the same colors. The

FRET ratios of an identically treated control cell cultured in the absence of 2 (see

Figure 2.9) are shown as open black circles.

47

Figure 2.9. Pseudocolored ratio FRET images of representative UVA-treated HEK

293T cells harboring the F1 construct, but cultured in the absence of 2 at the

indicated posttreatment time (in minutes). The color scale indicates the fluorescence

ratio (YPet/ECFP), and the scale bar is 20 µm.

2.4 Conclusions

In summary, we have engineered the first genetically encoded photoactivatable intein

compatible with living mammalian cells, in which a photocaged cysteine is used to

genetically replace the Cys1 residue of a highly efficient Npu DnaE intein. By

incorporating the photo-caging group, the protein splicing activity of the intein was

effectively and efficiently inhibited, and the activity was only observed after a brief

exposure to long wavelength UVA light. The resulting photocaged intein was inserted into

other proteins to directly control their primary structures. Because the Npu DnaE intein is

48

compatible with a myriad of extein sequences, such manipulation should be quite versatile.

A downstream C-extein Cys+1 residue is required for protein splicing, but cysteine can be

found in many proteins. In addition, a single cysteine mutation may be tolerated by many

proteins. Thus, the approach described here may be applied to a large percentage of

proteins. We acknowledge that additional N- and C-terminal extein sequences might affect

the kinetics of protein splicing. This issue can be addressed by using evolved inteins that

splice with higher efficiency at various splice junctions.[39] One might also prepare several

chimeric constructs at different splice sites to screen for variants retaining excellent

expression, stability, and post-photoactivation splicing kinetics. The use of the

photoactivatable inteins to control protein activity is highly attractive, because it requires

little information on the biochemistry or 3D structures of the proteins of interest. The

photoactivatable intein reported here is a new and powerful addition to the mammalian

opto-chemical genetic toolbox, permitting the modulation of proteins directly at the amino

acid sequence level.

49

References:

[1] Hirata R, Ohsumk Y, Nakano A, Kawasaki H, Suzuki K, Anraku Y. Molecular

structure of a gene, VMA1, encoding the catalytic subunit of H(+)- translocain adenosine

triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. Journal of

Biological Chemistry. 1990; 265(12):6726-33.

[2] Shah N, Muir T. Inteins: nature's gift to proein chemists. Chemical Science. 2014;

5(1):446-461.

[3] Topilina N, Mills K. Recent advances in in vivo applications of intein-mediated

protein splicing. Mobile DNA. 2014; 5(1):5.

[4] Mootz H. Split inteins as versatile tools for protein semisynthesis. Chembiochem.

2009; 10(16):2579-89.

[5] Peck S, Chen I, Liu D. Directed evolution of a small-molecule-triggered inein with

iproved splicing properies in mamalian cells. Chem. Biol. 2011; 18(5):619-30.

[6] Toettcher J, Voigt C, Weiner O, Lim W. The promise of optogenetics in cell

biology: interrogating molecular circuits in space and ime. Nat. Methods. 2011; 8(1):35-8.

[7] Cook S, Jack W, Xion X, Danley L, Ellman J, Schultz P, Noren C. Photochemically

initiated protein splicing. Angew. Chem., Int. Ed. 1995; 34:1629-1630.

[8] Vila-Perello M, Hori Y, Ribo M, Muir T. Activation of protein splicing by proease-

or light-triggered O to N acyl migration. Angew. Chem., Int. Ed. 2008; 47(40):7764-7.

[9] Berrade L, Kwon Y, Camarero J. Photomodulation of proein trans-splicing through

backbone photocaging of the DnaE split intein. Chembiochem. 2010; 11(10):1368-72.

50

[10] Binschik J, Zettler J, Mootz H. Photocontrol of protein activity mediated by the

cleavage reaction of a split intein. Angew. Chem., Int. Ed. 2011; 50(14):3249-52.

[11] Tyszkiewicz A, Muir T. Activation of protein splicing with light in yeast. Nat.

Methods 2008; 5(4):303-5.

[12] Zettler J, Schutz V, Mootz H. The naturally split Npu DnaE intein exhibits an

extraordinarily high rate in the protein trans-splicing reaction. FEBS Lett. 2009; 583(5):

909-14.

[13] Ellila S, Jurvansuu J, Iwai H. Evaluation and comparison of protein splicing by

exogenous inteins with foreign exteins in Escherichia coli. FEBS Lett. 2011;

585(21):3471-7.

[14] Cheriyan M, Pedamallu CS, Tori K, Perler F. Faser protein splicing with the Nostoc

punctiforme DnaE inein using non-native extein residues. J. Biol. Chem. 2013;

288(9):6202-11.

[15] Ramirez M, Valdes N, Guan D, Chen Z. Engineering split inein DnaE from Nosoc

punctiforme for rapid protein purification. Protein Eng. Des. Sel. 2013; 26(3), 215-23.

[16] Carvajal-Vallejos P, Pallisse R, Mootz HD, Schmidt S. Unprecedented rates and

efficiencies revealed for new natural split inteins from metagenomic sources. J. Biol. Chem.

2012; 287(34):28686-96.

[17] Wu N, Deiters A, Cropp TA, King D, Schultz P. A genetically encoded photocaged

amino acid. J. Am. Chem. Soc. 2004; 126(44):14306-7.

51

[18] Chen P, Groff D, Guo J, Ou W, Cellitti S, Geierstanger BH, Schultz P. A facile

system for encoding unnatural amino acids in mammalian cells. Angew. Chem., Int. Ed.

2009; 48(22):4052-5.

[19] Lemke E, Summerer D, Geierstanger B, Brittain S, Schultz P. Control of protein

phosphorylation with a genetically encoded photocaged amino acid. Nat. Chem. Biol. 2007;

3(12):769-72.

[20] Liu CC, Schultz P. Adding new chemistries to the genetic code. Annu. Rev.

Biochem. 2010; 79:413-44.

[21] Deiters A, Groff D, Ryu Y,Xie J, Schultz P. A genetically encoded photocaged

tyrosine. Angew. Chem., Int. Ed. 2006; 45(17):2728-31.

[22] Zhao J, Lin S, Huang Y, Zhao J, Chen PR. Mechanism-based design of a

photoactivatable firefly luciferase. J. Am. Chem. Soc. 2013: 135(20):7410-3.

[23] Gautier A, Deiters A, Chin JW. Light-activated kinases enable temporal dissection

of signaling networks in living cells. J. Am. Chem. Soc. 2011; 133(7):2124-7.

[24] Groff D, Wang F, Jockusch S, Turro NJ, Schultz P. A new strategy to photoactivate

green fluorescent protein. Angew. Chem., Int. Ed. 2010; 49(42):7677-9.

[25] Kang JY, Kawaguchi D, Coin I, Xiang Z, O ’ Leary DD, Slesinger PA, Wang L. In

vivo expression of a light-activatable potassium channel using unnatural amino acids.

Neuron. 2013; 80(2):358-70.

[26] Hemphill J, Chou C, Chin JW, Deiters A. Genetically encoded light-activated

transcription for spatiotemporal control of gene expression and gene silencing in

mammalian cells. J. Am. Chem. Soc. 2013; 135(36):13433-9.

52

[27] Gautier A, Nguyen DP, Lusic H, An W, Deiters A, Chin JW. Genetically encoded

photocontrol of protein localization in mammalian cells. J. Am. Chem. Soc. 2010;

132(12):4086-8.

[28] Baker AS, Deiters A. Optical control of protein function through unnatural amino

acid mutagenesis and other optogenetic approaches. ACS Chem. Biol. 2014; 9(7):1398-407.

[29] Nguyen DP, Mahesh M, Elsasser SJ, Hancock SM, Uttamapinant C, Chin JW.

Genetic encoding of photocaged cysteine allows photoactivation of TEV protease in live

mammalian cells. J. Am. Chem. Soc. 2014; 136(6):2240-3.

[30] Uprety R, Luo J, Liu J, Naro Y, Samanta S, Deiters A. Genetic encoding of caged

cysteine and caged homocysteine in bacterial and mammalian cells. ChemBioChem. 2014:

15(12):1793-9.

[31] Chen S, Chen ZJ, Ren W, Ai HW. Reaction-based genetically encoded fluorescent

hydrogen sulfide sensors. J. Am. Chem. Soc. 2012; 134(23):9589-92.

[32] Bochet CG. Photolabile protecting groups and linkers. J. Chem. Soc., Perkin Trans.

1 2002; 125-142.

[33] Li Y, Sierra AM, Ai HW, Campbell RE. Identification of sites within a monomeric

red fluorescent protein that tolerate peptide insertion and testing of corresponding circular

permutations. Photochem. Photobiol. 2008; 84(1):111-9.

[34] Macdonald PJ, Chen Y, Mueller JD. Chromophore maturation and fluorescence

fluctuation spectroscopy of fluorescent proteins in a cell-free expression system. Anal.

Biochem. 2012; 421(1):291-8.

53

[35] Johannessen CM, Boehm JS, Kim SY, Thomas SR., Wardwell L, Johnson LA,

Emery CM, Stransky N, Cogdill AP, Barretina J, Caponigro G, Hieronymus H, Murray

RR, Salehi-Ashtiani K, Hill DE, Vidal M, Zhao JJ, Yang X, Alkan O, Kim S, Harris JL,

Wilson CJ, Myer VE, Finan PM, Root DE, Roberts TM, Golub T, Flaherty KT, Dummer

R, Weber BL, Sellers WR, Schlegel R, Wargo JA, Hahn WC, Garraway LA. COT drives

resistance to RAF inhibition through MAP kinase pathway reactivation. Nature. 2010;

468(7326):968-72.

[36] Wang X, Pineau C, Gu S, Guschinskaya N, Pickersgill RW, Shevchik VE. Cysteine

scanning mutagenesis and disulfide mapping analysis of arrangement of GspC and GspD

protomers within the type 2 secretion system. J. Biol. Chem. 2012; 287(23): 19082-93.

[37] Seong J, Lu S, Ouyang M, Huang H, Zhang J, Frame MC, Wang Y. Visualization

of Src activity at different compartments of the plasma membrane by FRET imaging. Chem.

Biol. 2009; 16(1):48-57.

[38] Hemphill J, Govan J, Uprety R, Tsang M, Deiters A. Site-specific promoter caging

enables optochemical gene activation in cells and animals. J. Am. Chem. Soc. 2014;

136(19):7152-8.

[39] Lockless SW, Muir TW. Traceless protein splicing utilizing evolved split inteins.

Proc. Natl. Acad. Sci. U.S.A. 2009; 106(27):10999-1004.

54

Chapter 3: Expanding the Genetic Code for a

Dinitrophenyl Hapten

3.1 Introduction

Haptens are small molecules that induce strong immune responses when attached to

proteins or peptides.[1] Although they cannot trigger immune responses alone, these small

moieties contain antigenic determinants that can bind to pre-existing antibodies.[1] Due to

their high affinity and specificity, antibody-hapten interactions have been exploited for

diverse applications, such as affinity chromatography, immunohistochemistry, in situ

hybridization, and enzyme-linked immunoassay (ELISA).[2-4] DNP is one of the most

common haptens.[4-5] Polyclonal and monoclonal anti-DNP antibodies, as well as single

chain variable fragments (scFv) against DNP, are readily accessible reagents.[6] Therefore,

the ability to introduce DNP into proteins is important for the applications of DNP and

anti-DNP antibodies in separation and detection (Fig. 3.1).[4, 7-8] Moreover, DNP-

containing proteins and peptides can induce immunological hypersensitivity, and they have

been commonly used to probe the biology of immune systems.[9-12] In addition, because

about one percent of the circulating human antibodies can naturally bind to DNP[13-14], DNP

has been utilized to label disease-causing cancer cells and bacterial cells to initiate

antibody-mediated immune responses and trigger cytotoxicity and phagocytosis.[15-16]

Furthermore, self-antigens or weakly immunogenic antigens may be modified with DNP

to break the immune tolerance of the hosts and generate antibodies that are cross-reactive

55

to the self or weak antigens.[17-18] This immunotherapy strategy seems to be quite promising

for a variety of human diseases.[19]

Figure 3.1. Applications of DNP-labeled proteins.

Despite the potential of broad applications, the current methods for preparing DNP-labeled

proteins and peptides have significant limitations. For example, standard solid phase

peptide synthesis can only produce short DNP-containing peptides, whereas protein

56

labeling via reactive amino acid residues (e.g. cysteine and lysine) often lacks site-

specificity.[21] Expanding the genetic code of living cells and organisms is a popular

method for preparing proteins containing unnatural functional groups.[22-23] This method

has now enabled the site-specific incorporation of > 100 UAAs containing diverse side-

chain functional groups into biosynthesized proteins, but the genetic encoding of DNP-

containing UAAs has not yet been achieved. Herein, we describe our recent effort in

genetically encoding N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK, Scheme 3.1

compound 3) for the biological preparation of proteins containing site-specific DNP.

3.2 Materials and Methods

3.2.1 Chemical Synthesis of N6-(2-(2,4-dinitrophenyl)acetyl)lysine

(DnpK, 3)

Scheme 3.1. Synthetic route to prepare DnpK.

57

All chemicals were purchased from Sigma-Aldrich (St. Louis, MO) or Fisher Scientific

(Waltham, MA). N,N'-Dicyclohexylcarbodiimide (DCC, 1.13 g, 5.5 mmol) and N-

hydroxysuccinimide (NHS, 575 mg, 5 mmol) were added into 2,4-dinitrophenylacetic

acid (1, 1.13 g, 5 mmol) dissolved in CH2Cl2 (30 mL). The mixture was stirred at

room temperature for 18 h, followed by gravity filtration. Next, the filtrate was

concentrated in vacuo, and the residue was re-dissolved in THF (5 mL) and introduced

into an aqueous solution (30 mL) of Nα-(tert-butoxycarbonyl)-L-lysine (Boc-Lys-OH)

(1.23 g, 5 mmol) and NaHCO3 (840 mg, 10 mmol). The resulting mixture was stirred

at room temperature overnight, acidified with dilute HCl (1 M, 10 mL), and extracted

with ethyl acetate (20 mL) three times. Organic layers were combined and

concentrated in vacuo to yield a crude product, which was further purified using silica

gel column chromatography (EtOAc/Hexane = 3:1) to derive 2 as light yellow oil (1.18

g, 2.6 mmol). The overall yield was 52%.

58

3.2.2 Chemical Synthesis of N6-(2-(2-nitrophenyl)acetyl)lysine

(2-NPK) and N6-(2-(4- nitrophenyl)acetyl)lysine (4-NPK)

Scheme 3.2. Synthetic route to prepare 2-NPK and 4-NPK.

2-(2-Nitrophenyl)-acetic acid or 2-(4-nitrophenyl)-acetic acid (1 mmol, 181 mg) was

dissolved in CH2Cl2 (10 mL) on an ice-water bath. Next, NHS (1 mmol, 115 mg) and

DCC (1.1 mmol, 226 mg) were added. The mixture was stirred at room temperature

for 8 h, followed by gravity filtration. Next, the filtrate was concentrated in vacuo, and

the residue was re-dissolved in THF (5 mL) and introduced into an aqueous solution

(30 mL) of Boc-Lys-OH (1 mmol, 246 mg) and Na2CO3 (1 mmol, 106 mg). The

59

resulting solution was stirred at room temperature overnight, acidified with dilute HCl

(0.5 N, 4 mL), and extracted with EtOAc (10 mL) three times. Organic layers were

combined, dried over Na2SO4, and concentrated in vacuo to yield a crude product,

which was further purified using silica gel column chromatography (EtOAc/Hexane =

9:1) to yield light yellow solid (0.58 mmol, 240 mg). Next, TFA/ CH2Cl2 (1:2) was

added to remove the protection group to afford the final product. The overall yields

were 58% and 70% for 2-NPK and 4-NPK, respectively.

3.2.3 Evolution of a Mutant Aminoacyl-tRNA Synthetase

We followed a previous procedure[28] to construct an MbPylRS active site library, based on

overlap extension PCR with synthetic degenerate oligo-nucleotides (Integrated DNA

Technologies). The library was inserted into a pBK plasmid. pRep-tRNAPyl and pNeg-

tRNAPyl plasmids were used for positive and negative selection, respectively.[28] During

positive selection, the pBK-PylRS plasmids encoding the MbPylRS library were used to

transform E. coli DH10B competent cells harboring pRep-tRNAPyl Cells were plated on

LB agar plates containing tetracycline (Tet; 25 mg/mL), kanamycin (Kan; 50 mg/mL),

chloramphenicol (Cm; 70 mg/mL), and DnpK (1 mM) and were incubated at 378C for 48

h. Colonies on the plates were pooled, and total plasmids were mini-prepped. pBK-PylRS

plasmids were separated from pRep-tRNAPyl by agarose gel electrophoresis. Extracted

pBK-PylRS plasmids from the positive selection were introduced into DH10B containing

pNeg-tRNAPyl Cells were next plated on LB agar containing 50 mg/mL Kan, 100 mg/mL

ampicillin (Amp), and 0.2% L-arabinose. Plates were incubated at 37˚C for 16 hour. Cells

60

were pooled, and the pBK-PylRS plasmids were again separated and extracted. After two

alternative rounds of positive and negative selection, the mbPylRS mutants were subjected

to the third round of positive selection. To further validate survival clones from the third

positive selection, individual pBK-MbPylRS mutants were prepared and used to co-

transform DH10B electro-competent cells containing another plasmid, pBAD-sfGFP

Y39TAG. Fluorescence intensities of bacterial cells, in the presence or absence of 1 mm

DnpK, were quantified. The mutant leading to the largest fluorescence intensity difference

under the two conditions was named DnpKRS.

3.2.4 Computational Modeling of the DnpK/DnpKRS Complex

Structure

The mutant protein structure was modeled with SWISS-MODEL[33], based on the Protein

Data Bank (PDB) structure 2Q7H.[34] The ligand was edited in PyMOL.[35] The complex

structure was energy-minimized by using the YASARA energy-minimization server.

3.2.5 Protein Expression and Purification from E. coli

The gene in pBK-DnpKRS was amplified by PCR and inserted into a new pEAH plasmid

(KanR), which contains a tRNAPyl expression gene cassette driven by a proK promoter and

a synthetase expression gene cassette driven by a pBAD promoter. A pBAD plasmid

(AmpR) encoding sfGFP-Y39TAG, T4L-K65TAG, or Z-domain-K7TAG was used to co-

transform DH10B or a nfsA/nfsB double-deletion K12 strain[29], along with the pEAH-

DnpK plasmid. A single colony was used to inoculate 2YT medium [100 mL, containing

61

L-arabinose (0.2 %), ampicillin (100 mg/mL), and kanamycin (50 mg/mL)] in the presence

or absence of DnpK (1 mm) at 30˚C for 24 hour. Cells were harvested by centrifugation

and lysed with B-PER II protein extraction reagent (Pierce). His 6-tagged protein was

purified with Ni-NTA agarose beads (Qiagen) under native conditions according to the

manufacturer’s instructions.

3.2.6 Protein Expression and Purification from HEK293T Cells

The mammalian expression vector pCMV-DnpK was created by replacing the synthetase

in a previous pCMV-AbK plasmid.[37] This plasmid also contains a copy of the tRNA Pyl

gene under the control of a human U6 promoter. HEK293T cells were grown in DMEM

supplemented with 10% fetal bovine serum (FBS). Cells at 70% confluency were

transfected with mixtures of the corresponding plasmids by using linear polyethylenimine

(PEI, M W =25000). The culture medium was further supplemented with DnpK (1 mm) as

appropriate. When expressing EGFP in HEK293T cells, pCMV-DnpK (12 mg) and

pEGFP-Y39TAG (12 mg) were mixed with PEI (60 mg) to transfect cells in 100 mm

diameter cell culture dishes. Cells were harvested 72 hour after transfection, washed with

PBS (3 × 8 mL), and then collected and lysed with radio-immunoprecipitation assay

(RIPA) buffer on ice for 10 min. Lysates were cleared with a benchtop centrifuge at 5000g

for 2 min and were used directly for western blotting or purified by Ni-NTA agarose beads

(Qiagen).

62

3.2.7 Protein Electrospray Mass Spectrometry

Proteins were precipitated with methanol/chloroform and dissolved in formic acid/water

(1:100) solution for mass spectrometry characterization. Mass spectra were recorded on an

Agilent ESI-TOF instrument by direct infusion of proteins. Observed spectra were de-

convoluted to derive protein masses by using the Agilent LC/MSD Deconvolution package

provided with the instrument. The instrument detects protein masses within an expected

mass error of ±0.01%.

3.2.8 Western Blotting

PVDF membranes with blotted proteins were first blocked with 1% BSA for 1 h and then

incubated with HRP-conjugated anti-DNP antibody (cat. no. FP1129, PerkinElmer) in

1/500 dilution at 4˚C for 14 hour. A colorimetric One-Component TMB Membrane

Peroxidase Substrate (cat. no. 50–77–18, Kirkegaard & Perry Laboratories, Gaithersburg,

MD) was used to directly visualize the immobilized antibody.

3.3 Results

The amino acid DnpK was prepared from Nα-(tert-Butoxycarbonyl)-L-lysine (Boc-Lys-

OH) and 2,4-Dinitrophenylacetic acid in 52% overall yield in three steps. Proteins were

expressed in the presence or absence of 1 mM DnpK in E. coli cells containing (Fig. 3.2A).

Previous studies have genetically encoded a large number of lysine-derived UAAs using

mutants of pyrrolysyl-tRNA synthetase/pyrrolysyl tRNA (PylRS/tRNAPyl) pairs. Along

this line, we screened a M. barkeri PylRS (mbPylRS) library with complete randomization

63

at residues L270, Y271, L274, and C313 (and an additional Y349F mutation to enhance

tRNA aminoacylation[24]) for the capability of suppressing amber (TAG) codons in the

presence of DnpK. We performed multiple cycles of positive and negative selections in E.

coli strain DH10B, as previously described.[28] We identified an mbPylRS mutant with

Y271M, L274T, C313A, and Y349F mutations (DnpKRS) that survived in the third round

of positive selection. These mutated residues form an enlarged cavity to accommodate the

nonnative DNP functional group, as shown in a modeled structure of the DnpK/DnpKRS

complex (Fig. 3.2B).

64

Figure 3.2. (A) Chemical Structure of N6-(2-(2,4-dinitrophenyl)acetyl)lysine (DnpK).

(B) Computationally modeled structure of DnpKRS bound with DnpK. (C) SDS-

PAGE of Ni-NTA purified sfGFP. Proteins were expressed in the presence or

absence of 1 mM DnpK in E. coli cells containing tRNA. (D) ESI-MS analysis of the

intact sfGFP protein expressed in E. coli in the presence of DnpK.

We next introduced the genes for DnpKRS, the corresponding suppressor tRNA, and

sfGFP-Y39TAG (His6-tagged superfolder GFP containing a TAG codon for residue 39)

into DH10B E. coli cells. The full-length protein was produced in good yield in the

presence of 1 mM DnpK (4.4±1.5 mg per liter of culture), while full-length sfGFP was not

65

observed in the absence of DnpK (Fig. 3.2C). The resulting protein was characterized by

direct-infusion electrospray ionization mass spectrometry (ESI-MS). To our surprise, the

observed molecular mass did not match the molecular mass of sfGFP containing a DnpKRS

residue (Fig. 3.2D). Our spectrometer has a mass accuracy of 0.01%. The difference of the

expected and observed molar masses (31 Da) indicates that the nitro group(s) of the DnpK

residue was likely reduced in E. coli, although the exact chemical form of the reduced

species could not be determined from this MS experiment. To investigate whether the

problem was protein-specific, we also expressed T4 lysozyme and the Staphylococcal

protein A (SpA) Z-domain, each containing a TAG codon. The mismatch between the

expected and observed molecular masses still existed (Figure 3.3). The observation of

multiple reduction states for the small Z-domain protein further supports our assumption

that bacterial nitroreductases were problematic for expressing DnpK-containing proteins.

We next utilized a special E. coli strain[29], in which the nfsA and nfsB nitroreductase genes

were double deleted, to express sfGFP and the Z-domain. Unfortunately, this new strain

did not solve our problem (Figure 3.3), likely due to the presence of other nitroreductases

in E. coli. To explore which of the two nitro groups in DnpK is more amenable to reduction

and which state they were reduced to, we further synthesized two compounds containing a

single nitro group, N6-(2-(2-nitrophenyl)acetyl)lysine (2-NPK) and N6-(2-(4-

nitrophenyl)acetyl)lysine (4-NPK; Scheme 3.2). When either 2-NPK or 4-NPK was added

to the medium to culture DH10B cells containing DnpKRS, the suppressor tRNA, and

sfGFP-Y39TAG, full-length sfGFP was produced. ESI-MS analysis showed that the nitro

group at the para position of 4-NPK, but not the one at the ortho position of 2-NPK, was

66

reduced to amine (Figure 3.4). These results strongly suggest that the MS peaks showing

~30 Da shifts for the abovementioned sfGFP, T4 lysozyme, and Z-domain proteins (Figure

3.3) were likely due to reduction of the para nitro of DnpK to an amine. It is worthwhile to

note that the Z-domain protein, prepared from either DH10B or the nfsA/nfsB double-

deletion strain, also showed minor MS peaks corresponding to the unreduced protein or the

protein with both nitro groups reduced to amines (Figure 3.3D and 3.3E). This indicates

that, although the paranitro is preferable for reduction, the ortho nitro of DnpK might also

be reduced; the extent of reduction appears to be dependent on protein context. As DnpKRS

is a promiscuous enzyme that uses different amino acid substrates, we also attempted to

investigate whether the nitro groups were reduced before or after being incorporated into

proteins. We incubated the free amino acid, DnpK (1 mM), with DH10B cells for 20 hour

before analyzing the cell lysate with ESI-MS. We did not observe any peak corresponding

to the reduced forms of DnpK (Figure 3.5). Despite the possibility that our analytical

method has different sensitivity toward DnpK and its reduced forms, this result supports

that a large portion of the DnpK amino acid was intact in E. coli. Moreover, the distribution

of reduced forms of proteins seemed to be protein-dependent (Figure 3.4). We later

confirmed the activity of DnpKRS toward DnpK in mammalian cells, so we deduce that

the reduction proceeds after DnpK is incorporated into proteins. However, we cannot rule

out the possibility that reduced DnpK exists in E. coli but that we could not detect it owing

to its low abundance; reduced DnpK could be utilized by the promiscuous DnpKRS to also

form proteins containing reduced DnpK. Therefore, we attempted to mutate the residues

inside the b-barrels of fluorescent proteins. We used two constructs, pBAD-hsGFP

67

(66TAG) and pBAD-mApple (72TAG)[31], along with a TAG suppression plasmid

expressing DnpKRS and the suppressor tRNA. Unfortunately, we were unable to prepare

any full-length, folded, UAA-containing proteins under both conditions, possibly due to

the steric hindrance of DnpK (or its potentially reduced forms), which destabilizes protein

folds. Further research is needed to clarify how bacterial nitro-reductases interact with

nitro-containing small molecules and proteins.

68

Figure 3.3. Mass spectrometry analysis of the indicated proteins purified from

DH10B or the nfsA/nfsB double deletion strain, suggesting a reduced DNP

group in these proteins.

69

Figure 3.4. Mass spectrometry analysis of the indicated proteins purified from

DH10B in the presence of (A) 2-NPK or (B) 4-NPK. The data suggest that the

para nitro of 4-NPK was reduced to an amine.

70

Figure 3.5. Direct ESI-MS analysis (positive mode) of the lysate of DH10B cells

incubated with 1 mM DnpK for 20 h, showing the [M+H]+ peak (355.12) for

DnpK but no peak for reduced forms of DnpK (expected: 325 and 295). Other

peaks in this figure were likely caused by additional molecules in the cell lysate.

Figure 3.6. SDS-PAGE and Western blot of DnpK-containing EGFP and the

wild-type EGFP, purified from HEK 293T cells.

71

Figure 3.7. (A) Fluorescence imaging of HEK 293T cells containing genes for

pEGFP-Tyr39TAG, DnpKRS, and the corresponding suppressor tRNA, in the

presence or absence of DnpK (1 mM). (B) ESI-MS analysis of the intact EGFP

protein expressed in HEK 293T in the presence of DnpK. (C) SDS-PAGE of HEK

293T cell lysates and the purified EGFP protein. (D) Anti-DNP Western blot of

HEK 293T cell lysates and the purified EGFP protein. A colorimetric detection

method was used to locate anti-DNP antibodies. Also shown were bands from a pre-

stained protein marker.

Mammalian cells typically have less nitroreductase activities than E. coli.[32] We used

human embryonic kidney (HEK) 293T cells to express DnpK-containing proteins. The

genes encoding DnpKRS, the corresponding suppressor tRNA, and EGFP-Y39TAG (His

72

6-tagged enhanced GFP containing a TAG codon for residue 39) were co-expressed in

HEK293T cells in culture medium supplemented with 1 mm DnpK. Suppression of the

amber codon was verified by observing strong green fluorescence for cells cultured with

DnpK but not for cells cultured in the absence of DnpK (Figure 3.7A). We purified the

protein from cells cultured with DnpK and analyzed it with direct infusion ESI-MS. The

observed and expected molecular masses were well-matched (Figure 3.7B), indicating that

DnpK was site-specifically incorporated into EGFP and not further reduced in HEK293T

cells. To test the use of anti-DNP antibodies to recognize DnpK-containing proteins, we

performed SDS polyacryl-amide gel electrophoresis (SDS-PAGE) and western blot

analysis of HEK293T cell lysates, in addition to the EGFP protein affinity-purified with

Ni-NTA agarose beads (Figure 3.7C and D). We observed district bands on the western

blot, resulting from the selective interaction between DnpK-containing EGFP and anti-

DNP antibodies. Both bands were clear for the lysate mixture derived from the cells

cultured with DnpK and the pure DnpK-containing EGFP protein. We also performed a

control experiment to ensure no interactions occurred between anti-DNP antibodies and

the wild-type EGFP (Figure 3.6). These results further support that the DNP hapten was

site-specifically introduced into proteins in living HEK293T cells.

3.4 Conclusions

In summary, we have engineered mbPylRS to genetically encode a small-molecule hapten

moiety. Although the DNP moiety was unstable in E. coli, we found that its stability was

73

enhanced in mammalian HEK293T cells. This small hapten moiety was able to induce

selective interactions with anti-DNP antibodies, as shown in our western blot experiments.

The capability of genetically introducing DNP into proteins is expected to find broad

applications in biosensing and bioseparation, immunology, and therapeutics.

74

References:

[1] Chipinda I, Hettick JM, Siegel PD. Haptenation: chemical reactivity and protein

binding. J. Allergy 2011, 839682.

[2] Chan CP, Cheung YC, Renneberg R, Seydack M. New trends in immunoassays.

Adv. Biochem. Eng. Biotechnol. 2008; 109:123–154.

[3] Mondal K, Gupta MN, Roy I. Affinity-based strategies for protein purification.

Anal. Chem. 2006; 78(11):3499–3504.

[4] Jasani B, Thomas ND, Navabi H, Millar DM, Newman GR, Gee J, Williams ED.

Dinitrophenyl (DNP) hapten sandwich staining (DHSS) procedure. A 10 year review of its

principle reagents and applications. J. Immunol. Methods. 1992; 150(1-2):193–198.

[5] Shreder K. Synthetic haptens as probes of antibody response and

immunorecognition. Methods. 2000; 20(3):372–379.

[6] Varga JM, Klein GF, Fritsch P. Binding of a mouse monoclonal IgE (anti-DNP)

antibody to radio-derivatized polystyrene-DNP complexes. FASEB J. 1990; 4(9):2678–

2683.

[7] PERRONE JC. Separation of amino-acids as dinitrophenyl derivatives. Nature

1951; 167(4248):513–515.

[8] Hawthorne SJ, Pagano M, Harriott P, Halton DW, Walker B. The synthesis and

utilization of 2,4-dinitrophenyl-labeled irreversible peptidyl diazomethyl ketone inhibitors.

Anal. Biochem. 1998; 261(2):131–138.

[9] Mallone R, Nepom GT. Targeting T lymphocytes for immune monitoring and

intervention in autoimmune diabetes. Am. J. Ther. 2005; 12(6):534–550.

75

[10] Nakamura K, Mimura Y, Tanaka T, Fujikura Y, Takeo K. Affinity maturation of

anti-hapten antibodies in a single mouse analyzed by two-dimensional affinity

electrophoresis. Electrophoresis. 1993; 14(12):1338–1340.

[11] Eisen HN, Chakraborty AK. Immunopaleontology reveals how affinity

enhancement is achieved during affinity maturation of antibodies to influenza virus. Proc.

Natl. Acad. Sci. USA 2013; 110(1):7–8.

[12] Manne J, Mastrangelo MJ, Sato T, Berd D. TCR rearrangement in lymphocytes

infiltrating melanoma metastases after administration of autologous dinitrophenyl-

modified vaccine. J. Immunol. 2002; 169(6):3407–3412.

[13] Farah FS. Natural antibodies specific to the 2,4-dinitrophenyl group. Immunology.

1973; 25(2):217–226.

[14] Karjalainen K, Mäkelä O. Concentrations of three hapten-binding

immunoglobulins in pooled normal human serum. Eur. J. Immunol. 1976; 6(2):88–93.

[15] McEnaney PJ, Parker CG, Zhang AX, Spiegel DA. Antibody-recruiting molecules:

an emerging paradigm for engaging immune function in treating human disease. ACS Chem.

Biol. 2012; 7(7), 1139–1151.

[16] Fura JM, Sabulski MJ, Pires MM. D-amino acid mediated recruitment of

endogenous antibodies to bacterial surfaces. ACS Chem. Biol. 2014; 9(7):1480–1489.

[17] Grünewald J, Tsao ML, Perera R, Dong L, Niessen F, Wen BG, Kubitz DM, Smider

VV, Ruf W, Nasoff M, Lerner RA, Schultz PG. Immunochemical termination of self-

tolerance. Proc. Natl. Acad. Sci. USA. 2008; 105(32):11276–11280.

76

[18] Grünewald J, Hunt GS, Dong L, Niessen F, Wen BG, Tsao ML, Perera R, Kang M,

Laffitte BA, Azarian S, Ruf W, Nasoff M, Lerner RA, Schultz PG, Smider VV.

Mechanistic studies of the immunochemical termination of self-tolerance with unnatural

amino acids. Proc. Natl. Acad. Sci. USA. 2009; 106(11):4337–4342.

[19] Erkes DA, Selvan SR. Hapten-induced contact hypersensitivity, autoimmune

reactions, and tumor regression: plausibility of mediating antitumor immunity. J. Immunol.

Res. 2014, 175265.

[20] Amblard M, Fehrentz JA, Martinez J, Subra G. Methods and protocols of modern

solid phase Peptide synthesis. Mol. Biotechnol. 2006; 33(3):239–254.

[21] Sletten EM, Bertozzi CR. Bioorthogonal chemistry: fishing for selectivity in a sea

of functionality. Angew. Chem. Int. Ed. 2009; 48(38):6974–6998; Angew. Chem. 2009, 121,

7108–7133.

[22] Ai HW. Biochemical analysis with the expanded genetic lexicon. Anal. Bioanal.

Chem. 2012; 403(8):2089–2102.

[23] Liu CC, Schultz PG. Adding new chemistries to the genetic code. Annu. Rev.

Biochem. 2010; 79:413–444.

[24] Yanagisawa T, Ishii R, Fukunaga R, Kobayashi T, Sakamoto K, Yokoyama S.

Multistep engineering of pyrrolysyl-tRNA synthetase to genetically encode N(epsilon)-(o-

azidobenzyloxycarbonyl) lysine for site-specific protein modification. Chem. Biol. 2008;

15(11), 1187–1197.

[25] Ai HW, Lee JW, Schultz PG. A method to site-specifically introduce methyllysine

into proteins in E. coli. Chem. Commun. 2010; 46(30):5506–5508.

77

[26] Wang YS1, Russell WK, Wang Z, Wan W, Dodd LE, Pai PJ, Russell DH, Liu WR.

The de novo engineering of pyrrolysyl-tRNA synthetase for genetic incorporation of L-

phenylalanine and its derivatives. Mol. BioSyst. 2011; 7(3):714–717.

[27] Arbely E, Torres-Kolbus J, Deiters A, Chin JW. Photocontrol of tyrosine

phosphorylation in mammalian cells via genetic encoding of photocaged tyrosine. J. Am.

Chem. Soc. 2012; 134(29):11912–11915.

[28] Chen PR, Groff D, Guo J, Ou W, Cellitti S, Geierstanger BH, Schultz PG. A facile

system for encoding unnatural amino acids in mammalian cells. Angew. Chem. Int. Ed.

2009; 48(22):4052–4055; Angew. Chem. 2009, 121, 4112–4115.

[29] Valle A, Le Borgne S, Bolívar J, Cabrera G, Cantero D. Study of the role played

by NfsA, NfsB nitroreductase and NemA flavin reductase from Escherichia coli in the

conversion of ethyl 2-(2'-nitrophenoxy)acetate to 4-hydroxy-(2H)-1,4-benzoxazin-3(4H)-

one (D-DIBOA), a benzohydroxamic acid with interesting biological properties. Appl.

Microbiol. Biotechnol. 2012; 94(1):163–171.

[30] Chen ZJ, Ai HW. A highly responsive and selective fluorescent probe for imaging

physiological hydrogen sulfide. Biochemistry. 2014; 53(37):5966–5974.

[31] Shaner NC, Campbell RE, Steinbach PA, Giepmans BN, Palmer AE, Tsien RY.

Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma

sp. red fluorescent protein. Nat. Biotechnol. 2004; 22(12):1567–1572.

[32] Vass SO, Jarrom D, Wilson WR, Hyde EI, Searle PF. E. coli NfsA: an alternative

nitroreductase for prodrug activation gene therapy in combination with CB1954. Br. J.

Cancer. 2009; 100(12):1903–1911.

78

[33] Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F,

Gallo Cassarino T, Bertoni M, Bordoli L, Schwede T. SWISS-MODEL: modelling protein

tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014,

42, W252–W258.

[34] Kavran JM1, Gundllapalli S, O'Donoghue P, Englert M, Söll D, Steitz TA.

Structure of pyrrolysyl-tRNA synthetase, an archaeal enzyme for genetic code innovation.

Proc. Natl. Acad. Sci. USA 2007; 104(27):11268–11273.

[35] DeLano WL. The PyMOL User’s Manual, DeLano Scientific, San Carlos,

USA, 2002.

[36] Krieger E1, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus

K. Improving physical realism, stereochemistry, and side-chain accuracy in homology

modeling: Four approaches that performed well in CASP8. Proteins Struct. Funct. Bioinf.

2009, 77, Suppl. 9, 114–122.

[37] Ai HW, Shen W, Sagi A, Chen PR, Schultz PG. Probing protein-protein

interactions with a genetically encoded photo-crosslinking amino acid. ChemBioChem

2011; 12(12):1854–1857.

79

Chapter 4: Study of the Binding Energies

between Unnatural Amino Acids and

Engineered Orthogonal Tyrosyl-tRNA

Synthetases

4.1 Introduction

In most organisms, 61 trinucleotide codons encode for the 20 canonical amino acids.[1] An

additional three codons (UAA, UAG, and UGA) are nonsense “stop” codons that trigger

the termination of ribosomal protein synthesis.[2] Proteins can undergo posttranslational

modifications to further derivatize the canonical 20 amino acids, thereby increasing

complexity and biological control to achieve sophisticated cellular functions.[3] In the

1960s, nonsense suppression bacterial strains were discovered to express additional tRNAs

that recognize the stop codons UAG (amber suppressor), UAA (ochre suppressor), or UGA

(opal suppressor).[1,4] There are also exceptional examples in which uncommon amino

acids, such as pyrrolysine or selenocysteine, can be prompted for ribosomal peptide

synthesis in response to UAG or UGA codons, respectively.[5-7] Furthermore, in the past

few decades, research has expanded the possibilities of protein structures and functions

through an expansion of the genetic codes.[8, 9] Additional pairs of aminoacyl-tRNA

synthetases (aaRSes) and the corresponding tRNAs that do not cross-react with

endogenous tRNAs, aaRSes, and amino acids, have been engineered and expressed in

80

living bacteria, yeast, mammalian cells and several model multicellular organisms.[8-11]

These aaRSes and tRNAs have been pre-engineered to use unnatural amino acids (unAAs)

as their substrates, thereby affording the genetic encoding of unAAs in living cells and

organisms. This genetic code expansion technology has since produced modified proteins

with site-specific incorporation of a large array of unAAs[8], such as fluorescent amino

acids[12, 13], biophysical probes[14-16], photocrosslinkers[17-19], reactive chemical handles for

bioorthogonal reactions[20-22], photocaged amino acids[23-28], and amino acids identical to

or mimicking Post-translational modifications[29-32]. This method has now been broadly

utilized not only to create proteins with enhanced and novel properties, but also to develop

novel therapeutics and investigate protein structure and function.[8]

Engineered orthogonal tRNAs are typically adapted into an organism of interest from

evolutionarily distant species to lower the likelihood of cross-recognition by endogenous

aaRSes.[8] Even so, the corresponding aaRSes, with some exceptions involving pyrrolysyl-

tRNA synthetases[33], would still have to undergo extensive protein engineering to switch

their substrate preference from a native amino acid to a unAA. This bioengineering

procedure, which typically involves several rounds of positive and negative selection, is

laborious and time-consuming, and requires considerable expertise[8]. The number of

residues that can be simultaneously mutated is often limited to 5 or 6, due to technical

limitations with the molecular biology[34]. Hence, it is often not trivial to derive orthogonal

aaRSes for unAA substrates that are very different from the enzymes’ native substrates.

New strategies are currently being explored to circumvent this challenging positive and

81

negative selection process. For example, some existing orthogonal aaRSes can use several

different unAAs as their substrates, so their polyspecificity has been exploited for the

genetic encoding of new unAAs.[34-36] This substrate promiscuity is not problematic,

however, because the experimenter controls the unAA(s) supplemented into the culture

medium for any given experiment.

It may also be prudent to computationally design aaRSes for unAAs because computational

methods have been routinely utilized to study enzyme-substrate interactions.[37] Previously,

Wang et al. and Datta et al. computationally estimated the binding energies of natural E.

coli phenylalanyl- and methionyl-tRNA synthetases to several unnatural phenylalanine or

methionine analogues, respectively, and compared these values to the experimental

activities of these enzyme-substrate pairs.[38-39] Using this method, they were able to

genetically incorporate several unAAs into proteins using auxotrophic E. coli, but could

not achieve site-specificity due to cross-reactivity issues. Computational studies on

engineered orthogonal aaRSes and unAAs are scarce. In a previous work, Zhang et al.

reported a clash opportunity progressive (COP) method to identify a possible mutant of M.

jannaschii tyrosyl-tRNA synthetase (MjTyrRS) for preferential binding to O-methyl-L-

tyrosine over Tyr.[40] In another work, Sun et al. docked p-acetyl-L-phenylalanine (AcF)

into 60 different MjTyrRS mutants to identify possible mutations benefiting enzyme-

substrate binding.[41] The full capability of the methods by Zhang et al. and Sun et al. in

designing orthogonal aaRSes for unAAs is still unclear. Each study focused on only one

unAA, and the mutant aaRSes derived from their computational studies were not

82

experimentally tested; although their computationally derived sequences showed some

similarities to orthogonal aaRSes previously derived from experimental studies by Schultz

et al.[10, 42]

Aminoacylation of tRNAs by aaRSes is a complex, multi-step process.[43] In general, aaRS

is bound by a specific amino acid substrate, which is subsequently activated through an

adenylation reaction with ATP. The activated amino acid is then transferred to the 3′ end

of an aaRS-bound tRNA by releasing the attached AMP molecule, consequently producing

a charged tRNA. A computational model depicting this entire process is very difficult to

produce. Like many other enzyme-substrate studies[37], we presume that the ability for an

amino acid to bind a particular aaRS is very important for establishing their enzyme-

substrate relationship. To achieve the goal of computationally designing orthogonal

aaRSes for unAAs, it might not be necessary to accurately estimate the absolute binding

energy and binding affinity of an aaRS/unAA pair; however, it is critical to identify

computational parameters that can group favorable and unfavorable aaRS-amino acid

complexes. Herein, we report our evaluation of several computational methods for scoring

binding energies of a number of aaRS-amino acids complexes. These benchmark

experiments were performed with complexes of orthogonal MjTyrRS or EcTyrRS mutants

bound to their experimentally verified unAA substrates, and compared to Tyr—the natural

substrate for wild-type MjTyrRS and EcTyrRS. We compared the results of several popular

computational methods, including AutoDock Vina[44], ROSETTA[45], and Molecular

Mechanics/Poisson–Boltzmann Surface Area (MM/PBSA)[46]. We performed MM/PBSA

83

binding energy scoring based on a 10-ns Molecular Dynamics (MD) simulation, and direct

MM-PBSA scoring based on single energy-minimized structures. These tested methods,

which required varying amounts of computational resources, yielded different capabilities

for grouping favorable and unfavorable aaRS-amino acid interactions. Moreover, we

analyzed the factors contributing to amino acid recognition. In particular, a polyspecific

EcTyrRS mutant was studied for its capacity to utilize several different unAAs as its

enzymatic substrates.[36, 47]

4.2 Methods

4.2.1 Preparation of aaRS-Amino Acid Complexes

We computationally studied seven aaRS-amino acid complexes. The following two X-ray

crystal structures were downloaded from Protein Data Bank (PDB): MjTyrRS-derived p-

acetylphenylalanyl-tRNA synthetase (MjAcFRS) bound with AcF (PDB 1ZH6)[48], and

MjTyrRS-derived 3-iodotyrosyl-tRNA synthetase (MjIoYRS) bound with 3-iodo-L-

tyrosine (IoY) (PDB 2ZP1)[49]. These two complexes were cleaned by removing water

molecules, co-crystallized ions, and non-amino acid ligands. Hydrogen atoms of the amino

acid ligands were added in VEGA ZZ.[50] To derive the complex structures of proteins

bound with Tyr, the side chains of the unAA ligands in the above two complexes were

manually edited in VEGA ZZ and combined with the corresponding protein coordinates

by matching the coordinates of the unchanged ligand atoms. This process generated two

additional aaRS-amino acid complexes: MjAcFRS bound with Tyr, and MjIoYRS bound

with Tyr. No X-ray crystal structure is available for the polyspecific synthetase (EcPolyRS)

84

derived from EcTyrRS. Based on the X-ray crystal structure of the wild-type EcTyrRS

(PDB 1X8X), we used SWISS-MODEL[52] to perform homologous modeling of the

EcPolyRS structure. The coordinates of Tyr in 1X8X were combined with the modeled

protein coordinates to derive an EcPolyRS-Tyr complex. We also manually edited Tyr in

VEGA ZZ to derive coordinates for the unAAs, p-iodophenylalanine (IoF) and AcF. They

were combined with the modeled EcPolyRS coordinates to derive two additional

complexes: EcPolyRS bound with IoY and EcPolyRS bound with AcF.

Figure 4.1. Chemical structures of natural and unnatural amino acids used in this

study (1: p-acetyl-L-phenylalanine, AcF; 2: 3-iodo-L-tyrosine, IoY; 3: p-iodo-L-

phenylalanine, IoF; and 4: L-tyrosine, Tyr).

Further relaxation of these complexes was achieved with GROMACS-4.6.5.[53, 54] The

force field for proteins was set to AMBER99SB.[55] ACPYPE[56] was used to treat ligands

based on Generalized Amber Force Field (GAFF).[57, 58] The complexes were immersed in

a dodecahedron box of SPC/E water molecules. The water box was extended 1 nm from

solute atoms in all directions. Counter ions, such as Na+ and Cl–, were added to neutralize

the systems. Particle mesh Ewald (PME) was used to treat the long-range electrostatic

85

interactions in molecular mechanics (MM) energy minimization. The systems were

minimized by using the steepest descent algorithm. The minimization was stopped either

at 50,000 steps or until the maximum force was smaller than 10.0 kJ/mol.

4.2.2 Binding Energy Scoring with Autodock Vina and ROSETTA

The energy score function embedded in Autodock Vina 1.1.2[44] was used to assess the

binding free energies of all complexes. The pdbqt files of proteins and ligands were

prepared in AutoDockTools[59] from the above-mentioned complexes. Polar hydrogens

were added and the binding free energies were calculated using the embedded “score only”

option in Autodock Vina. Coordinates of proteins and ligands were separated in PyMol.

We followed a previously reported procedure to score aaRS-amino acid complexes using

ROSETTA 3.5[45]. The interface energy term was used in this study to evaluate the binding.

86

Pro

tein

s M

jAcF

RS

M

jIoY

RS

E

cPoly

RS

Am

ino

aci

ds

AcF

T

yr

IoY

T

yr

IoF

A

cF

Tyr

Ener

gy

Sco

res

Auto

Dock

Vin

a

(kca

l/m

ol)

− 7

.09

[1.1

5]a

− 6

.15

− 6

.85

[1.0

5]a

− 6

.54

− 5

.95

[1.0

5]a

− 6

.56

[1.1

5]a

− 5

.68

RO

SE

TT

A (

RE

U)

− 1

0.7

2

[1.2

0]a

− 8

.94

− 1

6.1

2

[1.4

4]a

− 1

1.2

1

− 1

3.2

2

[1.1

9]a

− 1

3.2

7

[1.2

0]a

− 1

1.0

8

Tab

le 4

.1. E

stim

ate

d b

ind

ing

fre

e en

ergie

s u

sin

g A

uto

Dock

Ven

a a

nd

RO

SE

TT

A f

or

the

seven

tes

ted

aaR

S-a

min

o

aci

d c

om

ple

xes

. a

Rati

os

of

the

esti

ma

ted

bin

din

g f

ree

ener

gie

s fo

r th

e in

dic

ate

d a

aR

S-u

nA

A co

mp

lexes

to t

he

bin

din

g f

ree

ener

gie

s of

the

corr

esp

on

din

g a

aR

S-T

yr

com

ple

xes

.

87

Figure 4.2. The RMSD values in the MD trajectories of the seven studied aaRS-

amino acid complexes.

88

4.2.3 Molecular Dynamics Simulations

Molecular Dynamics (MD) simulations were performed in Gromacs-4.6.5.[53, 54] The

solvated and MM-energy-minimized ligand-protein complexes were heated to 300 K

during a 100 ps constant volume simulation with 2 fs time step. The pressure was then

equilibrated to 1 atm during a 100 ps isothermal-isobaric NPT simulation with 2 fs time

step. All heavy atoms were position-restrained with a force constant of 1000

kJ•mol−1•nm−2. Simulations were performed for 10 ns with a time step of 2 fs. The

temperature and pressure were maintained at 300 K and 1 atm using the V-rescale

temperature and Parrinello-Rahman pressure coupling method, respectively. The time

constants for the temperature and pressure coupling were set at 0.1 ps and 2 ps, respectively.

Short-range, non-bonded interactions were computed for the atom pairs within the 9 Ao

cutoff. Long-range electrostatic interactions were calculated using a PME summation

method with fourth-order cubic interpolation and 1.6 Ao grid spacing. All bonds were

constrained using the parallel LINCS method. Xmgrace was used to plot the data and

graphs generated from Gromacs.

4.2.4 MM/PBSA Binding Energy Calculation

We used g_mmpbsa to estimate MM/PBSA binding energies.[60] The average binding

energy was calculated from 100 snapshots extracted every 50 ps from the MD trajectories

between 5 and 10 ns. The non-polar solvation energy was calculated based on the SASA

model. The vacuum and solvent dielectric constants were set at 1 and 80, respectively. The

solute dielectric constant was set at 2. The entropy term was not included in our binding

89

energy calculation. A bootstrap analysis was performed to obtain standard errors. To

calculate the binding energy based on single snapshots, we followed all of the

abovementioned procedure, except that MM-energy-minimized aaRS-unAA complexes

were directly utilized for energy calculations without any MD treatment.

4.3 Results and Discussion

4.3.1 Selection and Preparation of aaRS-Amino Acid Complexes

A very large number of orthogonal aaRSes have been derived from MjTyrRS and

EcTyrRS[8], which are currently widely utilized for the genetic encoding of unAAs in

bacterial and eukaryotic cells, respectively. We examined available co-crystal structures of

MjTyrRS mutants with unAAs and decided to use two complexes in our study: MjAcFRS

bound with AcF, and MjIoYRS bound with IoY. AcF has a side-chain carbonyl group for

H-bond formation with the corresponding aaRS, whereas the IoY side chain can interact

with the aaRS through both H-bonding and non-H-bonding van der Waals interactions

(Figure 4.1). For our study, we also selected an EcTyrRS-derived polyspecific synthetase,

EcPolyRS, which was originally engineered for the genetic encoding of IoF. In addition to

IoF, we later found that EcPolyRS was also capable of using several other unAAs,

including AcF, as its substrate.[36, 47] Because no X-ray crystal structure of EcPolyRS is

available, we used the wild-type EcTyrRS 3D-structure as the template for homologous

modeling of EcPolyRS. The manually edited coordinates of unAAs, IoF and AcF (Figure

4.1), were next combined with the modeled protein structure. The side chain of IoF is

expected to interact with EcPolyRS mainly through non-H-bonding van der Waals

90

interactions, while the carbonyl group of AcF would act as an excellent H-bond donor. We

also modeled Tyr into these aforementioned aaRS structures in order to computationally

evaluate and compare the binding energies of these aaRSes to Tyr. All three selected

aaRSes have an excellent capacity for discriminating unAAs from Tyr, as shown from

previous protein expression experiments by the lack of Tyr usage as an enzymatic substrate

at physiological concentrations.[11, 42, 49] Because our ultimate goal is to computationally

design orthogonal aaRSes for unAAs, and currently, it is difficult to model water molecules

at the protein/ligand interfaces to effectively mediate interactions, we removed water

molecules from these complex structures. All abovementioned amino acid-aaRS

complexes were subjected to relaxation through a standard MM energy minimization

process.

4.3.2 Binding Energy Scoring with AutoDock Vina and ROSETTA

In order to achieve the computational design of orthogonal aaRSes for unAAs, it is crucial

to predict their interaction modes and ultimately differentiate the interactions of aaRSes

between different amino acid ligands. In this present study, we do not evaluate strategies

for protein randomization and binding pose searching. Instead, we mainly focus on

approaches to evaluate binding affinities of aaRSes and amino acids. Energy scoring

functions implemented in docking programs are usually designed to minimize computing

costs, and thus, they can be utilized to evaluate large numbers of protein–ligand

complexes.[44, 61] We first utilized AutoDock Vina, a popular molecular docking suite that

includes an Amber-force-field-based scoring function, to evaluate the interactions of our

91

selected aaRSes with corresponding unAAs and Tyr. The estimated free energies of

binding were all within the range of − 5.68 to − 7.09 kJ/mol (Table 4.1). Although the

estimated binding free energies between aaRSes and unAAs were typically lower than that

between aaRSes and Tyr, the differences were minimal. The binding free energies scored

with AutoDock Vina, for both MjIoYRS-IoY and EcPolyRS-IoF, were only different from

their corresponding aaRS-tyrosine complexes by 5%. Larger differences (~15%) were

observed for MjAcFRS-AcY and EcPolyRS-AcF. The binding free energies for the

examined four favorable aaRS-unAA complexes were − 6.61 ± 0.49 kJ/mol, whereas the

binding energies for the three unfavorable aaRS-Tyr complexes were − 6.09 ± 0.39 kJ/mol.

This method failed to confidentially distinguish favorable interaction from unfavorable

interactions. It is worthwhile to note that the numbers in Table 4.1 were derived by scoring

single poses from X-ray crystal structures or homologous models, and the gaps were not

improved by performing protein-ligand docking with flexible aaRS side chains. We next

turned to ROSETTA, another popular suite of programs widely used for protein structure

prediction, protein design, and protein-protein and protein-ligand docking.[62] We scored

the interface energies of various aaRSes-amino acid complexes. The estimated interface

energies in ROSETTA energy units (REU) are shown in Table 4.1. This method was

generally capable of identifying the binding energy differences of aaRSes to their real

unAA substrates and Tyr; and these differences were within the range of 19% to 44%.

However, when all estimated interface energies were analyzed together, there was no

obvious threshold to differentiate between favorable and unfavorable interactions, as

defined by wet lab results. For example, the interface energy score for the unfavorable

92

MjIoYRS-Tyr complex (− 11.21 REU) was even lower than that for the favorable

MjAcFRS-AcF complex (− 10.72 REU). Our data indicates that ROSETTA might not be

very reliable to predict whether an amino acid is a true substrate of a particular aaRS.

4.3.3 Binding Energy Estimation by MD-MM/PBSA or Direct

MM/PBSA

Compared to energy scoring functions implemented in docking programs, free-energy

simulation techniques, such as MD-MM/PBSA, are known to have better accuracy for

binding energy ranking. However, this gain is accompanied by a much higher

computational cost. We performed 10-ns MD simulations for each protein-ligand complex.

We monitored the root-mean-square deviation (RMSD) values of the whole complexes and

observed that they typically reached a plateau after the first 3–4 ns (Figure 4.2). We next

selected 100 equal-interval snapshots between 5 ns and 10 ns of each simulation to estimate

binding free energies for each aaRS-amino acid complex. Considering that the aaRS-amino

acid interfaces are moderately charged, we used a dielectric constant of 2 to estimate the

energy values.[63] Previous studies also showed that the conformational entropy was only

important for predicting absolute binding free energies but not important for ranking the

binding affinities of similar substrates.[64] Hence, in order to minimize computational costs,

we did not include the entropy term in our calculations. The estimated binding energies for

the aaRSes and their favorable unAA substrates were within the range of − 15.35 to − 19.16

kcal/mol, whereas the estimated binding energies for the aaRSes and their unfavorable Tyr

substrate were within the range of − 9.82 to − 10.56 kcal/mol (Table 4.2), illustrating a

93

distinct gap between these two groups of values. The average binding energy for the former

group was − 16.61 ± 1.76 kcal/mol, whereas the latter group was − 10.19 ± 0.37 kcal/mol.

Subjecting these two groups to a two-tailed test yields a p-value of 0.004, indicating a

significant difference. It is well accepted that MD simulation improves energy calculations

by using conformational sampling, but comes at the cost of significant computational

resources, thereby making MD-MM/PBSA evaluations of a large number of aaRS-amino

acid complexes infeasible.[61] We next utilized MM/PBSA to directly score single energy-

minimized structures of the seven aaRS-amino acid complexes.[65] The results (Table 4.2),

derived from a much-reduced computing cost, were slightly different from energy values

from sampling MD trajectories, but still useful in grouping favorable aaRS-aaRS

complexes from unfavorable ones. The estimated binding energies for these favorable

complexes were within the range of − 16.48 to − 21.87 kcal/mol, whereas the numbers for

these unfavorable ones were within the range of − 9.33 to − 11.13 kcal/mol. The average

value for the former group was − 18.83 ± 2.36 kcal/mol, whereas the latter group was −

10.14 ± 0.91 kcal/mol. A two-tailed test still showed a significant difference (p = 0.002)

between the two groups. Scoring functions of docking softwares use various

approximations to increase computational efficiency.[61] These methods are designed for

screening a large number of mutants with reasonable speeds, but at the cost of accuracy.

MM/PBSA uses a more-rigorous scoring function, generally leading to better prediction

accuracy.[65] Considering this and based on our results, we suggest using direct MM/PBSA

scoring to re-evaluate top hits of orthogonal aaRS designs from docking programs, such as

AutoDock Vina and ROSETTA. Moreover, for the few top-ranked candidates in single-

94

structure MM/PBSA scoring experiments, it may be desirable to perform MD and

MM/PBSA rescoring based on snapshots of MD trajectories to increase the accuracy. This

combinatorial approach, which balances computational costs and prediction accuracy, has

the potential to accelerate the engineering of orthogonal aaRS for the genetic encoding of

unAAs.

95

∆Evdw a ∆Eele a ∆Gps a ∆GSASAa ∆Gtotal

a,b

MjAcFRS +

AcF

MD-

MM/PBSA −30.56 ± 0.27

−26.12 ±

0.30 44.45 ± 0.22 −3.11 ± 0.01 −15.35 ± 0.22

direct

MM/PBSA −30.08 −27.73 43.49 −3.24 −17.56

MjAcFRS +

Tyr

MD-

MM/PBSA −25.61 ± 0.25

−23.46 ±

0.30 41.68 ± 0.21 −2.79 ± 0.01 −10.17 ± 0.23

direct

MM/PBSA −23.72 −25.99 43.31 −2.92 −9.33

MjIoYRS +

IoY

MD-

MM/PBSA −30.08 ± 0.29

−26.49 ±

0.33 43.09 ± 0.31 −2.97 ± 0.01 −16.45 ± 0.26

direct

MM/PBSA −30.15 −31.61 42.71 −2.81 −21.87

MjIoYRS +

Tyr

MD-

MM/PBSA −24.77 ± 0.25

−25.48 ±

0.31 43.18 ± 0.24 −2.73 ± 0.01 −9.82 ± 0.25

direct

MM/PBSA −23.54 −31.54 46.61 −2.66 −11.13

EcPolyRS +

IoF

MD-

MM/PBSA −26.98 ± 0.28

−30.01 ±

0.31 40.76 ± 0.38 −2.93 ± 0.01 −19.16 ± 0.32

direct

MM/PBSA −30.19 −43.17 56.82 −2.88 −19.41

EcPolyRS +

AcF

MD-

MM/PBSA −30.07 ± 0.24

−31.87 ±

0.28 49.55 ± 0.32 −3.10 ± 0.01 −15.49 ± 0.25

direct

MM/PBSA −29.76 −47.76 64.09 −3.05 −16.48

EcPolyRS +

Tyr

MD-

MM/PBSA −23.64 ± 0.32

−31.41 ±

0.35 47.17 ± 0.28 −2.68 ± 0.01 −10.56 ± 0.24

direct

MM/PBSA −25.99 −41.69 60.54 −2.61 −9.75

Table 4.2. Calculated binding energies using MD-MM/PBSA or direct MM/PBSA

for the seven aaRS-amino acid complexes. a All values are given in kcal/mol, and

MD-MM/PBSA values are given as average ± S.D. b The total of van der Waals

interaction energy ( ∆E vdw ), electron static energy (∆E ele ), and polar (∆G ps )

and nonpolar (∆G SASA ) solvation energy.

96

Figure 4.3. The contributions of individual amino acid residues of aaRSes to the

total binding energies, shown as the energy contribution differences between the

indicated aaRS-unAA complexes and aaRS-Tyr complexes. Negative values indicate

a stabilization effect for aaRS-unAA interactions or a destabilization effect for

aaRS-Tyr interactions, whereas positive values indicate a destabilization effect for

aaRS-unAA interactions or a stabilization effect for aaRS-Tyr interactions.

4.3.4 Binding Modes of aaRS-unAA Complexes

The first step of tRNA aminoacylation involves the interaction of an amino acid substrate

to the aaRS, which is often the initial focus of engineering orthogonal aaRSes because its

potential interaction with the natural Tyr substrate has to be minimized. Compared to co-

crystal structures or structures derived from molecular modeling, MD-MM/PBSA studies

can provide information on the dynamics and energy contributions for aaRS-amino acid

recognition. We analyzed the contributions of individual amino acid residues of aaRSes to

the total binding energies of all studied aaRS-amino acid complexes (Figure 4.3). We also

97

averaged MD structures from the MD trajectories to derive aaRS-amino acid complex

structures (Figure 4.4). We identified His70, Gln109, Gln155, Gly158, and Cys159 to be

important for maintaining the interaction of MjAcFRS to AcF versus Tyr (Figures 4.3A

and 4.4A). Gln109 forms a H-bond to the carbonyl group of AcF, but not to Tyr. Gly158

and Cys159 form non-H-bond van der Waals packing interactions with the methyl group

of AcF. His70 and Gln155 interact with residues 109, 158, and 159 to further stabilize the

MjAcFRS-AcF complex. Similarly, we found that Met154, Gln155, and Thr158 are critical

for establishing packing of MjIoYRS to IoY versus Tyr (Figures 4.3B and 4.4B). Three

residues in the amino acid-binding pocket of EcPolyRS are different from the

corresponding residues of wild-type EcTyrRS: Ile37, Ser182, and Met183. Surprisingly,

no strong interaction is conferred by these mutations to differentiate AcF and IoF from Tyr.

Our MD study suggests that the relative interacting position of Tyr in EcPolyRS is slightly

different from that of AcF or IoF. The hydroxyl group of Tyr is located toward Gln195 to

form a H-bond, which contributes to the stabilization of the EcPolyRS-Tyr complex

(Figures 4.3C,D and 4.4C,D). However, this twist significantly destabilizes the interaction

with Asp81 and Asp41, consequently yielding an energy-disfavored complex as a whole.

In comparison, AcF and IoF occupy the binding pocket in a way similar to Tyr in wild-

type EcTyrRS. They have more favorable interactions to Asp81 and Asp41. This

phenomenon might explain the polyspecificity of EcPolyRS, which has been shown to use

at least 14 different unAAs as its enzymatic substrate. These unAAs likely interact with

EcPolyRS in a direction similar to that of AcF and IoF, but not to Tyr; whereas no strong

side-chain recognition is required to stabilize these EcPolyRS-unAA complexes. We also

98

observed a H-bond between Asn126 and the carbonyl group of AcF, but such stabilization

does not exist in the EcPolyRS-IoF complex, and likely, it does not exist in many other

EcPolyRS-unAA complexes considering the structural diversity of these 14 different

unAAs.[36, 47]

99

Figure 4.4. MD-averaged structures showing the active sites of the studied aaRSes

and unAA complexes. (A): MjAcFRS bound with AcF; (B): MjIoYRS bound with

IoY; (C): EcPolyRS bound with AcF; and (D): EcPolyRS bound with IoF). Ligands

are shown as cyan sticks. Residues important for substrate specificity are shown as

green sticks. In panels C and D, Tyr ligands are shown as magenta sticks for

comparison. Ile37, Ser182, and Met183 of EcPolyRS are shown as gray balls.

4.4 Conclusions

We performed computational studies to evaluate the binding energies of several aaRS-

amino acid complexes. Using orthogonal aaRS-unAA pairs whose strong interactions have

100

been previously reportedin experimental studies, we compared the accuracy of AutoDock

Vina, ROSETTA, MM/PBSA, and MD-MM/PBSA in terms of grouping favorable and

unfavorable interactions based on estimated binding free energies. We found that the most

accurate grouping was derived from MM/PBSA based on either 10-ns MD trajectories or

single energy-minimized structures. As such, we suggest using MM/PBSA to re-score top-

hit poses produced by other faster, but less-accurate programs, in future aaRS-designing

experiments. We also compared the binding models of the studied aaRSes to unnatural and

natural amino acids. In general, the aaRSes established new H-bonds, or non-H-bond van

der Waals interactions, to stabilize their unAA substrates. Moreover, they may adopt

conformations to largely destabilize their interactions to the native Tyr substrate, as shown

in the twisted interactions between EcPolyRS and Tyr. We hope to use these results to

guide future designing and development of new aaRSes, and to extend the capability of the

genetic code expansion technology to many new unAAs.

101

References:

[1] Murgola, E. J. tRNA, suppression, and the code. Annu. Rev. Genet. 19, 57–80

(1985).

[2] Jukes, T. H. & Osawa, S. Evolutionary changes in the genetic code. Comp.

Biochem. Physiol. B 106, 489–94 (1993).

[3] Walsh, C. T., Garneau-Tsodikova, S. & Gatto, G. J., Jr. Protein posttranslational

modifications: the chemistry of proteome diversifications. Angew. Chem. Int. Ed. 44,

7342–72 (2005).

[4] Goodman, H. M., Abelson, J., Landy, A., Brenner, S. & Smith, J. D. Amber

suppression: a nucleotide change in the anticodon of a tyrosine transfer RNA. Nature 217,

1019–24 (1968).

[5] Brown, C. M., Dalphin, M. E., Stockwell, P. A. & Tate, W. P. The translational

termination signal database. Nucleic Acids Res. 21, 3119–23 (1993).

[6] Bock, A. et al. Selenocysteine: the 21st amino acid. Mol. Microbiol. 5, 515–20

(1991).

[7] Krzycki, J. A. The direct genetic encoding of pyrrolysine. Curr. Opin. Microbiol.

8, 706–12 (2005).

[8] Liu, C. C. & Schultz, P. G. Adding new chemistries to the genetic code. Annu. Rev.

Biochem. 79, 413–44 (2010).

[9] Ai, H. W. Biochemical analysis with the expanded genetic lexicon. Anal. Bioanal.

Chem. 403, 2089–2102 (2012).

102

[10] Wang, L., Brock, A., Herberich, B. & Schultz, P. G. Expanding the genetic code of

Escherichia coli. Science 292, 498–500 (2001).

[11] Chin, J. W. et al. An Expanded Eukaryotic Genetic Code. Science 301, 964–967

(2003).

[12] Summerer, D. et al. A genetically encoded fluorescent amino acid. Proc. Natl.

Acad. Sci. USA. 103, 9785–9789 (2006).

[13] Wang, J., Xie, J. & Schultz, P. G. A genetically encoded fluorescent amino acid. J.

Am. Chem. Soc. 128, 8738–9 (2006).

[14] Lee, H. S., Spraggon, G., Schultz, P. G. & Wang, F. Genetic incorporation of a

metal-ion chelating amino acid into proteins as a biophysical probe. J. Am. Chem.

Soc .131, 2481–3 (2009).

[15] Smith, E. E., Linderman, B. Y., Luskin, A. C. & Brewer, S. H. Probing Local

Environments with the Infrared Probe: l-4- Nitrophenylalanine. J. Phys. Chem. B 115,

2380–2385 (2011).

[16] Cellitti, S. E. et al. In vivo incorporation of unnatural amino acids to probe structure,

dynamics, and ligand binding in a large protein by nuclear magnetic resonance

spectroscopy. J. Am. Chem. Soc. 130, 9268–81 (2008).

[17] Chin, J. W., Martin, A. B., King, D. S., Wang, L. & Schultz, P. G. Addition of a

photocrosslinking amino acid to the genetic code of Escherichiacoli. Proc. Natl. Acad. Sci.

USA .99, 11020–4 (2002).

103

[18] Ai, H. W., Shen, W., Sagi, A., Chen, P. R. & Schultz, P. G. Probing protein-protein

interactions with a genetically encoded photocrosslinking amino acid. Chembiochem 12,

1854–1857 (2011).

[19] Zhang, M. et al. A genetically incorporated crosslinker reveals chaperone

cooperation in acid resistance. Nat. Chem. Biol. 7, 671–7 (2011).

[20] Chin, J. W. et al. Addition of p-azido-L-phenylalanine to the genetic code of

Escherichia coli. J. Am. Chem. Soc. 124, 9026–7 (2002).

[21] Lang, K. et al. Genetically encoded norbornene directs site-specific cellular protein

labelling via a rapid bioorthogonal reaction. Nat. Chem. 4, 298–304 (2012).

[22] Lang, K. et al. Genetic Encoding of bicyclononynes and trans-cyclooctenes for site-

specific protein labeling in vitro and in live mammalian cells via rapid fluorogenic Diels-

Alder reactions. J. Am. Chem. Soc. 134, 10317–20 (2012).

[23] Deiters, A., Groff, D., Ryu, Y., Xie, J. & Schultz, P. G. A genetically encoded

photocaged tyrosine. Angew. Chem. Int. Ed. 45, 2728–31 (2006).

[24] Chen, P. R. et al. A facile system for encoding unnatural amino acids in mammalian

cells. Angew. Chem. Int. Ed. 48, 4052–5 (2009).

[25] Baker, A. S. & Deiters, A. Optical Control of Protein Function through Unnatural

Amino Acid Mutagenesis and Other Optogenetic Approaches. ACS Chem. Biol. 9, 1398–

407 (2014).

[26] Wu, N., Deiters, A., Cropp, T. A., King, D. & Schultz, P. G. A genetically encoded

photocaged amino acid. J. Am. Chem. Soc .126, 14306–7 (2004).

104

[27] Arbely, E., Torres-Kolbus, J., Deiters, A. & Chin, J. W. Photocontrol of tyrosine

phosphorylation in mammalian cells via genetic encoding of photocaged tyrosine. J. Am.

Chem. Soc. 134, 11912–5 (2012).

[28] Ren, W., Ji, A. & Ai, H. W. Light activation of protein splicing with a photocaged

fast intein. J. Am. Chem. Soc. 137, 2155–8 (2015).

[29] Wang, Y. S. et al. A genetically encoded photocaged N-methyl-L-lysine. Molecular

BioSystems 6, 1557–1560 (2010).

[30] Neumann, H., Peak-Chew, S. Y. & Chin, J. W. Genetically encoding N-epsilon-

acetyllysine in recombinant proteins. Nat. Chem. Biol. 4, 232–234 (2008).

[31] Park, H. S. et al. Expanding the genetic code of Escherichia coli with

phosphoserine. Science 333, 1151–4 (2011).

[32] Ai, H. W., Lee, J. W. & Schultz, P. G. A method to site-specifically introduce

methyllysine into proteins in E. coli. Chem. Commun. 46, 5506–8 (2010).

[33] Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & Chin, J. W. Encoding

multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. Nature

464, 441–4 (2010).

[34] Wang, Y. S., Fang, X., Wallace, A. L., Wu, B. & Liu, W. R. A rationally designed

pyrrolysyl-tRNA synthetase mutant with a broad substrate spectrum. J. Am. Chem. Soc.

134, 2950–3 (2012).

[35] Young, D. D. et al. An evolved aminoacyl-tRNA synthetase with atypical

polysubstrate specificity. Biochemistry 50, 1894–900 (2011).

105

[36] Chatterjee, A., Xiao, H., Bollong, M., Ai, H. W. & Schultz, P. G. Efficient viral

delivery system for unnatural amino acid mutagenesis in mammalian cells. Proc. Natl.

Acad. Sci. USA. 110, 11803–8 (2013).

[37] Linder, M. Computational Enzyme Design: Advances, hurdles and possible ways

forward. Comput. Struct. Biotechnol. J. 2, e201209009 (2012).

[38] Wang, P., Vaidehi, N., Tirrell, D. A. & Goddard, W. A., 3rd. Virtual screening for

binding of phenylalanine analogues to phenylalanyl-tRNA synthetase. J. Am. Chem. Soc.

124, 14442–9 (2002).

[39] Datta, D., Vaidehi, N., Zhang, D. & Goddard, W. A., 3rd. Selectivity and specificity

of substrate binding in methionyl-tRNA synthetase. Protein Sci. 13, 2693–705 (2004).

[40] Zhang, D., Vaidehi, N., Goddard, W. A., 3rd, Danzer, J. F. & Debe, D. Structure-

based design of mutant Methanococcus jannaschii tyrosyl-tRNA synthetase for

incorporation of O-methyl-L-tyrosine. Proc. Natl. Acad. Sci. USA. 99, 6579–84 (2002).

[41] Sun, R., Zheng, H., Fang, Z. & Yao, W. Rational design of aminoacyl-tRNA

synthetase specific for p-acetyl-L-phenylalanine. Biochem. Biophys. Res. Commun. 391,

709–15 (2010).

[42] Wang, L., Zhang, Z., Brock, A. & Schultz, P. G. Addition of the keto functional

group to the genetic code of Escherichia coli. Proc. Natl. Acad. Sci. USA. 100, 56–61

(2003).

[43] Ibba, M. & Soll, D. Aminoacyl-tRNAs: setting the limits of the genetic code. Genes

Dev. 18, 731–8 (2004).

106

[44] Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of

docking with a new scoring function, efficient optimization, and multithreading. J.

Comput. Chem. 31, 455–61 (2010).

[45] Meiler, J. & Baker, D. ROSETTALIGAND: protein-small molecule docking with

full side-chain flexibility. Proteins 65, 538–48 (2006).

[46] Kollman, P. A. et al. Calculating structures and free energies of complex molecules:

combining molecular mechanics and continuum models. Acc. Chem. Res. 33, 889–97

(2000).

[47] Chen, Z. J., Ren, W., Wright, Q. E. & Ai, H. W. Genetically encoded fluorescent

probe for the selective detection of peroxynitrite. J. Am. Chem. Soc. 135, 14940–3 (2013).

[48] Turner, J. M., Graziano, J., Spraggon, G. & Schultz, P. G. Structural

characterization of a p-acetylphenylalanyl aminoacyl-tRNA synthetase. J. Am. Chem. So.c

127, 14976–7 (2005).

[49] Sakamoto, K. et al. Genetic encoding of 3-iodo-L-tyrosine in Escherichia coli for

single-wavelength anomalous dispersion phasing in protein crystallography. Structure 17,

335–44 (2009).

[50] Pedretti, A., Villa, L. & Vistoli, G. VEGA-an open platform to develop chemo-bio-

informatics applications, using plug-in architecture and script programming. J. Comput.

Aided Mol. Des. 18, 167–73 (2004).

[51] Kobayashi, T. et al. Structural snapshots of the KMSKS loop rearrangement for

amino acid activation by bacterial tyrosyl-tRNA synthetase. J. Mol. Biol. 346, 105–17

(2005).

107

[52] Biasini, M. et al. SWISS-MODEL: modelling protein tertiary and quaternary

structure using evolutionary information. Nucleic Acids Res. 42, W252–8 (2014).

[53] Hess, B., Kutzner, C., van der Spoel, D. & Lindahl, E. GROMACS 4: Algorithms

for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory

Comp. 4, 435–447 (2008).

[54] Pronk, S. et al. GROMACS 4.5: a high-throughput and highly parallel open source

molecular simulation toolkit. Bioinformatics 29, 845–854 (2013).

[55] Hornak, V. et al. Comparison of multiple Amber force fields and development of

improved protein backbone parameters. Proteins 65, 712–25 (2006).

[56] Sousa da Silva, A. & Vranken, W. ACPYPE - AnteChamber PYthon Parser

interfacE. BMC Research Notes 5, 367 (2012).

[57] Wang, J., Wang, W., Kollman, P. A. & Case, D. A. Automatic atom type and bond

type perception in molecular mechanical calculations. J. Mol. Graph Model. 25, 247–60

(2006).

[58] Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development

and testing of a general amber force field. J. Comput. Chem. 25, 1157–74 (2004).

[59] Morris, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with

selective receptor flexibility. J. Comput. Chem. 30, 2785–91 (2009).

[60] Kumari, R., Kumar, R. & Lynn, A. g_mmpbsa—A GROMACS Tool for High-

Throughput MM-PBSA Calculations. J. Chem. Inf. Model. 54, 1951–1962 (2014).

108

[61] Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in

virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3,

935–49 (2004).

[62] Liu, Y. & Kuhlman, B. RosettaDesign server for protein design. Nucleic Acids Res.

34, W235–8 (2006).

[63] Sun, H. et al. Assessing the performance of MM/PBSA and MM/GBSA methods.

5. Improved docking performance using high solute dielectric constant MM/GBSA and

MM/PBSA rescoring. Phys. Chem. Chem. Phys. 16, 22035–22045 (2014).

[64] Hou, T., Wang, J., Li, Y. & Wang, W. Assessing the performance of the MM/PBSA

and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on

molecular dynamics simulations. J. Chem. Inf. Model 51, 69–82 (2011).

[65] Rastelli, G., Del Rio, A., Degliesposti, G. & Sgobba, M. Fast and accurate

predictions of binding free energies using MM-PBSA and MM-GBSA. J. Comput. Chem.

31, 797–810 (2010).

109

Chapter 5: Summary

In the thesis, a protein splicing method by implementing a photocaged fast intein has been

developed. A photocaged cysteine was genetically introduced into a highly efficient Nostoc

punctiforme (Npu) DnaE intein. The resulting photocaged intein was inserted into a red

fluorescent protein (RFP) mCherry and a human Src tyrosine kinase to create inactive

chimeric proteins. A light-induced photochemical reaction was able to reactivate the intein

and trigger protein splicing. Active mCherry and Src were formed as observed by direct

fluorescence imaging or imaging of a Src kinase sensor in mammalian cells. The

genetically encoded photocaged intein is a general optogenetic tool, allowing effective

photocontrol of primary structures and functions of proteins. In future, this method could

be applied into various disease-related cysteine proteases (e.g. cathepsin) for medical

systems biology study. Ideally, scientists would be able to uncover the dynamics of the cell

system, which was induced by functionalities of those enzymes.

In the third chapter, a dinitrophenyl hapten unnatural amino acid was genetically

incorporated in the cell system. On the one hand, it introduced the drug industry an

alternative way to label protein drugs; on the other hand, it provided us an approach to

study the immunostimulatory activity of various haptenated proteins.

Additionally, we evaluated the binding affinities between unnatural amino acids and

engineered orthogonal tyrosyl-tRNA synthetases through computational approaches. In the

110

study, we did observe a higher binding affinity between unUAAs and their canonical

engineered tyrosyl-tRNA synthetases, comparing with tyrosine and those synthetases.

More importantly, the evaluation method could be implemented into our future

computational synthetase design workflow. In the coming stage, I am very interested in

those unnatural amino acids, which could mimic the functionality of post-translational

modification residues. Ideally, we hope that orthogonal synthetases for unnatural amino

acids could be designed by computer, instead of tedious experimental screening process.

Also, those unnatural amino acids could serve as useful tools in studying PTM related

systems biology.

wei ren dissertation

Documents