chapter 6 role of thymine in manipulation of...
TRANSCRIPT
62
CHAPTER 6
ROLE OF THYMINE IN MANIPULATION OF CARBON IN
DIFFERENT PROTEINS
6.1 INTRODUCTION
Viruses cause diseases from common cold to acquired immunodeficiency
syndrome (AIDS). Viruses are genetic material, either RNA or DNA, coated by
proteins. The genetic material has instruction for its multiplication and the
infected virus instructs the host cell to duplicate it. Sometime or the other, all
are infected by these viruses. Matrix proteins are understandably responsible for
virus assembly and budding and these structural proteins link the viral envelope
and the core. These proteins are proved to be responsible for expelling the
genetic material, once the virus entered into the host cell (Mebatsion et al.,
1999).
The matrix proteins of different viruses were analyzed for distribution of
carbon content and large hydrophobic residues. The amount and distribution of
adenine in viral genome is a factor in deciding the protein stability from
hydrophobicity point of view. The genetic material of different viral genome
and different segments of viruses were analyzed in this study.
Viruses are classified depending on the nucleic acid constituents. Influenza
viruses are negative single stranded RNA used for synthesizing mRNAs.
Influenza A viruses are pandemic due to sudden mutation/variation in surface
proteins. There are records of evidence that the Influenza A virus may mutate
into a form that can be transmitted to human easily. The mutations lead to
different forms of surface proteins that form different structure. The carbon
63
content and distribution leads to formation of these many structures on mutation
(Rajasekaran et al., 2012).
Paramyxoviruses are enveloped viruses that replicate in the cytoplasm and
contain a single stranded negative-sense RNA genome. The genome contains 6
primary genes: nucleocapsid (N), phosphoprotein (P), matrix (M), the fusion
(F) and attachment (HN, H, or G) proteins, and the polymerase (L), along with
accessory proteins that vary with the viral species (Lamb, 2001). Enveloped
viruses obtain their envelope from cellular membranes once the components of
the virus have assembled at the lipid bilayer. The assembly process brings
together the glycoproteins spanning the lipid bilayer with the inner core of the
virus particle. The inner layer of the membrane consists of a viral protein that
bridges the glycoproteins and the inner core, dubbed the matrix or M protein. M
is an essential protein, without which the production of virus particle is highly
impaired if not impossible.
The M protein of Sendai virus (SeV-M), a member of the Paramyxovirinae
subfamily, Paramyxoviridae family is produced in the cytoplasm and self-
associates to form a leaflet at the inner face of the plasma membrane (Takimoto
and Portner, 2004). In the virus particle, it carpets the inner part of the viral
envelope, where it interacts with the two surface glycoproteins, HN and F, on
the one hand and with the viral ribonucleoprotein complex (N protein plus viral
RNA) associated with the L and P proteins on the other hand (Lamb, 2001).
Furthermore, paramyxovirus M has been reported to participate in the
regulation of RNA synthesis (Kras'ko et al., 1986; Suryanarayana et al., 1994;
Ghildyal et al., 2003; Reuter et al., 2006). Human respiratory syncytial virus
(RSV) was first isolated in 1956 from a laboratory Chimpanzee with upper
respiratory tract disease (Hall, 2001; McNamara and Smyth, 2002; Taylor,
2007; Collins and Crowe, 2007). RSV was quickly determined to be of human
64
origin and was studied to be the leading worldwide viral agent of pediatric
respiratory tract disease. RSV (family Paramyxoviridae, order
Mononegavirales) is an enveloped virus with a single-stranded antisense RNA
genome of 15.2 kb (Collins and Crowe, 2007).
The animal versions of RSV include bovine RSV (BRSV) and pneumonia
virus of mice (PVM), suggesting that species jumping took place during the
evolution of these viruses. RSV proteins form the viral envelope by associating
with the lipids in the membrane (Collins and Crowe, 2007). The matrix M
protein lines the inner envelope surface and is vital in virions morphogenesis
(Teng and Collins, 1998). The heavily glycosylated G, fusion F, and small
hydrophobic SH proteins are transmembrane surface glycoproteins.
Since its identification in 1998 to 1999, Nipah virus (NiV), a member of
the family Paramyxoviridae, is widely associated with numerous outbreaks of
fatal viral encephalitis in humans in Southeast Asia (Chua et al., 2000).The
vesicular stomatitis virus (VSV) belongs to the Rhabdoviridiae family and
causes acute infections in a wide range of mammalian hosts (Wagner and Rose,
1995). Its matrix protein (VSV M) plays a key role in viral assembly and
budding and also in the inhibition of host cell gene expression during infection.
During its expression in vertebrate cells, in the absence of any other viral
protein, VSV M produces dramatic alterations in cellular RNA metabolism and
mRNA expression (Black and Lyles, 1992; Ahmed and Lyles, 1998).
The matrix (M) protein of rhabdoviruses plays a key role in virus assembly
and budding, however, the precise mechanism by which M mediates these
processes is still unclear. A highly conserved, proline-rich motif (PPxY or PY
motif, where P represents proline, Y denotes tyrosine, and x denotes any amino
acid) of rhabdoviral M proteins has been associated with a possible role in
budding mediated by the M protein. Point mutations that disrupt the PY motif
65
of the M protein of vesicular stomatitis virus (VSV) revealed no obvious effect
on membrane localization of M, however, lead to a decrease in the amount of M
protein released from cells in a functional budding assay. Moreover, the PPxY
sequence within rhabdoviral M proteins is identical to that of the ligand which
interacts with WW domains of cellular proteins. Amino acids 17 through 33
and 29 through 44, which contain the PY motifs of VSV and rabies virus M
proteins, mediate interactions with WW domains of specific cellular proteins.
Point mutations that disrupt the consensus PY motif of VSV or rabies virus M
protein result in a significant decrease in their ability to interact with the WW
domains. These properties of the PY motif of rhabdovirus M proteins are
strikingly similar to those of the late (L) budding domain in the gag-specific
protein p2b of Rous sarcoma virus (Harty et al., 1999).
The Ebola virus is a member of the Filoviridae family of negative-sense
RNA viruses (Colebunders et al., 2000; Takada et al., 2001). Ebola virus
infects both primates and humans and mostly leads to severe hemorrhagic
fever, with high mortality rates of upto 90%. Currently, no approved vaccines
or antiviral drugs are available to prevent and/or treat Ebola virus infections
(Bray et al., 2002). The VP24 protein of Ebola virus is the secondary matrix
protein and minor component of virions, whereas, the VP40 protein of Ebola
virus is the principle matrix protein and the most abundant virions component.
The structure and function of VP40 have been well characterized. The C-
terminal domain of VP40 contains large hydrophobic patches that may be
involved in the interaction with the lipid bilayers (Han et al., 2003).
Chikungunya virus (CHIKV) produces a dengue-like illness in humans,
characterized by fever, rashes, and severe arthralgia persisting for a few weeks
to several months. CHIKV is an alphavirus of the family Togaviridae and it
contains a genome of a linear, positive-sense, single stranded RNA of
approximately 11.8 kb (Schuffenecker et al., 2006).
66
Dengue virus (DENV) is a mosquito-borne, single-stranded, positive (+)-
sense RNA virus of genome size 10.27 kb belonging to
the Flaviviridae family, whose members are responsible for diseases such as
yellow fever, Japanese encephalitis, tick-borne encephalitis, dengue fever and a
dengue hemorrhagic fever. Organization of the Dengue virus (DENV) genome
represents an example of intra-genomic long distance interactions that modulate
molecular processes. The DENV genome encodes a single-open reading frame,
flanked by highly structured untranslated regions (UTRs) (Villordo et al., 2009;
Filomatori et al., 2011; Manzano et al., 2011). The secondary structure of
DENV UTRs has attracted attention, as they have been shown to encompass
motifs involved in regulation of translation, replication, transcription and viral
pathogenesis (Pijlman et al., 2008; Silva et al., 2010; Manzano et al., 2011).
Rabies is an acute, central nervous system infection, characterized by CNS
irritation, followed by paralysis and death. Rabies is caused by the
virus Neurotropic lyssavirus, a member of the Rhabdovirus family. It is a
single-stranded, neurotropic, negative-sense RNA virus of genome size 11 kb
which encodes 5 proteins: a glycoprotein, a nucleoprotein, and three others
(Wunner, 1991).
6.2 METHODOLOGY
The genome and protein sequences of Influenza A virus H1N1 were
retrieved from NCBI (on 01-10-2011). The amino acid compositions of 11
proteins were calculated using an in house program (AACOMP). The carbon
distributions in these 11 proteins were computed using our CARBANA
program available online (www.rajasekaran.net.in/tools/carbana.html), which
uses the principle of 31.45% of carbon. A length of 700 atoms was selected for
calculation. The genome of Influenza A H1N1 contains 8 segments. The
number of bases in each segment was counted and tabulated. The role of uracil
67
in mRNA sequence of matrix proteins was investigated by counting the number
of large hydrophobic residues (FILMV), coded by XUX (X=A, U, G or C). The
matrix protein sequences of different viruses were obtained from SWISSPROT
database (on 05-09-2011).
The ATGC content of viral genomes of Chikungunya, Dengue, and Rabies
was calculated. The mRNA sequences of different proteins were analyzed for
base composition and the average base composition was also calculated. The
thymine distribution in different reading frames was computed separately, as it
is important for the production of proteins with adequate large hydrophobic
residues. Mutational study based on carbon distribution was carried out at site
V715 of PB1 protein. The CARd program (Rajasekaran, 2012) was used for the
study with parameters of 255 atoms (~17 amino acid) as outer length and 35
atoms as inner length. The results were plotted for comparison of native and
mutational forms.
6.3 DATASETS
The matrix protein sequences of Sendai virus (P06446), Vesicular
stomatitis Indiana virus (P03519), Influenza A virus (P05777), Human
respiratory syncytial virus (P03419), Zaire ebolavirus (Q05128) and Nipah
virus (Q9IK90) were collected from SWISSPROT protein database (on 05-09-
2011). The viral genome sequences of Chikungunya, Dengue and Rabies were
downloaded (on 01-10-2011) from NCBI and their ATGC were counted.
6.4 RESULTS AND DISCUSSION
Carbon is the only element that contributes towards the dominant force,
hydrophobic interaction in proteins. Proteins evolve based on carbon content
and may influence the coding of genes. It is reported that proteins prefer to have
a 31.45% of carbon for their stability (Rajasekaran et al., 2009). Depending
68
upon the carbon content, the protein and the corresponding mRNA survive and
are passed on to the next generation. The carbon content is determined by the
presence of different types of amino acids and the arrangement of these amino
acids is instructed in the gene.
The adenine in gene is transcribed as uracil in mRNA. The uracil in mRNA
is responsible for the number of large hydrophobic residues in proteins and the
number of uracil in mRNAs is decreased during evolution. Due to this reason,
the number of large hydrophobic residues decreases which causes the
production of proteins with less carbon contents, changing the proteins non-
functional. On the other side, the viruses are stitched into the host cell that
produces proteins with high carbon content. This is again possible because of
adenine in viral genetic materials get transcribed into uracil in mRNAs.
The Influenza A virus H1N1 genome contains 11 genes with eight
segments of RNAs, encoding for 11 proteins namely PB2, Polymerase 1(PB1),
PB1-F2, Polymerase (PA), Haemagglutinin (HA), Nucleocapsid protein (NP),
Neuraminidase(NA),Matrix proteins(M2, M1), Nonstructural proteins(NS2,
NS1). List of proteins is taken for carbon distribution study is tabulated with
accession number in Table 6.3. The results on carbon distribution in these
proteins have been shown in Figures 6.1-6.7. PB1-F2 and M2 are very small
proteins, which have not been shown here.
69
Figure 6.1 Carbon distribution pattern of HA protein
27.1
28.55
30
31.45
32.9
34.35
35.8
46 146 246 346 446 546 646
Car
bo
n %
Residue Number
70
Figure 6.2 Carbon distribution pattern of NA protein
28.55
30
31.45
32.9
34.35
35.8
45 145 245 345 445 545
Car
bo
n %
Residue NUmber
71
Figure 6.3 Carbon profile of NP protein
27.1
28.55
30
31.45
32.9
34.35
44 144 244 344 444 544
Car
bo
n %
Residue Number
72
Figure 6.4 Carbon distribution pattern in PA protein
25.65
27.1
28.55
30
31.45
32.9
34.35
35.8
44 144 244 344 444 544 644 744
Car
bo
n %
Residue Number
73
Figure 6.5 Carbon profile of PB1 protein
25.65
27.1
28.55
30
31.45
32.9
34.35
35.8
37.25
38.7
43 143 243 343 443 543 643 743 843
Car
bo
n %
Residue Number
74
Figure 6.6 Carbon distribution pattern in PB2 protein
27.1
28.55
30
31.45
32.9
34.35
43 143 243 343 443 543 643 743 843
Car
bo
n %
Residue Number
75
Figure 6.7 Carbon distribution profile of M1 protein
27.1
28.55
30
31.45
32.9
34.35
43 93 143 193 243 293
Car
bo
n %
Residue Number
76
Plot of atom number (X-axis) versus % of carbon (Y-axis) has been
performed to demonstrate the carbon distribution in proteins of Influenza A H1N1
virus. The two membrane proteins (HA and NA) clearly showed a rich carbon
content all along the sequence. Normally, a threshold value of 31.45% is expected
along the sequences. Hydrophobic region at 120-240 is shown above the line of
31.45% in Figure 6.1 and Figure 6.2. First half of the sequence is found out to be
hydrophilic while next half is predicted to be hydrophobic.
NP is a nucleoprotein which wraps around genomic RNA forming a
ribonucleoprotein complex and this protein contains a mix of both hydrophobic
and hydrophilic regions along the sequence. A long stretch of hydrophilic region
calculated at 200-350 in Figure 6.3. This is predicted to be unfolding region in the
protein. The PA and PB1, the viral RNA polymerase proteins showed a normal
carbon distribution while, PA protein had slightly higher carbon content (33-
35%).The results are shown in Figure 6.4 and 6.5. The PB2 protein shows a
periodic repeats of hydrophobic and hydrophilic stretches shown in Figure 6.6.
The two small non-structural proteins, NS1 and NS2 showed a normal carbon
content though not significant.
The matrix protein, M1 displayed a less amount of carbon (28%) due to the
presence of hydrophilic regions shown in Figure 6.7. The matrix proteins and
Haemagglutinin proteins play an important role in the viral entry into the host cell.
Hence, this carbon distribution study will certainly help to modify these proteins
for efficient functioning.
77
Table 6.1 Nucleotide content in different segments of Influenza A H1N1 virus
Segment
No.
Sequence
ID
No. of proteins
(Gene)
A T G C Total
1 60484 1 (PB2) 786 531 597 427 2341
2 324897 2 (PB1, PB1-F2) 811 545 526 459 2341
3 60808 1 (PA) 751 545 521 416 2233
4 62290 1 (HA) 621 417 409 331 1778
5 324709 1 (NP) 504 335 412 314 1565
6 324507 1 (NA) 428 381 351 253 1413
7 60788 2 (M2, M1) 294 248 268 217 1027
8 324833 2 (NS2, NS1) 285 211 215 179 890
78
Table 6.2 Nucleotide content in different viruses
Name of the
virus
Sequence
ID A T G C Total
Chikungunya 156751972 3492 2385 2968 2950 11795
Dengue type 1 14485523 3439 2298 2765 2219 10721
Rabies 9627197 3421 3130 2736 2645 11932
79
Table 6.3 List of human Influenza A virus proteins taken for carbon distribution
S. No. Proteins Name of Proteins (ID) Length
1 RNA Polymerase Basic 2 PB2(NP_040987.1) 759
2 RNA Polymerase Basic 1 PB1(NP_040985.1) 757
3 Polymerase Acidic PA(NP_040986.1) 716
4 Haemagglutinin HA(NP_040980.1) 566
5 Nucleoprotein NP(NP_040982.1) 498
6 Neuraminidase NA(NP_040981.1) 454
7 Matrix Proteins M1(NP_040978.1)
&
M2(NP_040979.2)
252
&
97
8 Non-Structural Proteins NS1(NP_040984.1)
&
NS2(NP_040983.1)
230
&
121
80
The genomic particulars of the InfluenzaA H1N1 virus have been given in
Table 6.1. Number of proteins and their gene identifiers has been given in the
second column. It was noted that the number of adenine was higher in all
segments. Because of this high adenine content, the host cell produces mRNA with
higher amount of uracil that is responsible for producing proteins with higher
number of large hydrophobic residues (including F, I, L, M and V), which will
make the protein to have a higher carbon content.
Analysis of mRNA sequences reveals that the presence of higher adenine
content might add an appropriate number of large hydrophobic residues in the
proteins. In particular, the adenine as the second nucleotide in frame 1 is important
to have an appropriate uracil as the second nucleotide in frame 4 in back chain.
Framing these adenine (uracil at the back chain) is important for producing highly
functional and long lasting proteins that will be passed on to the next generation.
This is the classical example of adding more hydrophobic proteins to the host cell
naturally. In human mRNAs, the adenine content is less. The above principle can
be adapted for adding genomic content that will give adequate carbon distribution
in the proteins.
The carbon distribution studies on viral proteins revealed that the viral
proteins had higher carbon content. The atomic composition plays a role in
evolution of proteins. The carbon distribution along the protein chain was
dominated by the presence of uracil in frame 1 of its mRNA. The adenine in viral
genome is responsible for uracil in the corresponding mRNA.
The amount of adenine was always greater than other bases in all viruses as
shown in Table 6.2. Probably, nature has been adding more hydrophobic elements
in the proteins through viruses. All the large hydrophobic residues such as
Phenylalanine (F), Isoleucine (I), Leucine (L), Methionine (M) and Valine (V) are
coded by codons XUX (where X = A, U, G or C). Earlier investigation on this
codon distribution in mRNA sequences of different species revealed that, the
frame 1 preferred to maintain some degree of this codon for maintaining sufficient
81
hydrophobicity in its proteins. The matrix proteins of different species were
analyzed for the presence of XUX in frame 1i.e the number and percentage of
large hydrophobic residues were calculated. The presence of excess uracil in the
coding sequences was observed.
The carbon content of matrix proteins was further analyzed with
CARBANA tool. The results for matrix protein of different viruses have been
shown in Figures 6.8-6.12. Matrix protein of Sendai virus shows overall the
protein is hydrophilic in nature shown in Figure 6.8. Overall the protein is
hydrophobic in nature of Vesicular stomatitis Indiana virus matrix protein shown
in Figure 6.9. It might misfold. In Figure 6.10 shows long stretch (50-150
residues) of hydrophilic region predicted to be unfolded in nature. The Zaire ebola
virus matrix protein has 87-136 and 155-188 residues shows hydrophobic in
Figure 6.11. Nipah virus matrix protein shows even distribution along the line of
31.45% is observed. It is shown in Figure 6.12.Disorders are not found.
The carbon content in matrix protein was greater than the expected value of
31.45% in entire sequences except in few places. The first half of the sequence
appeared to have higher carbon content than the next half. The number of large
hydrophobic residues in the first half was 55 and the other half had 75. Residues
like Tryptophan, Tyrosine, Proline, Histidine, Glutamate and Aspartate might also
be contributing to the carbon content. Similarly, residues such as Arginine,
Cystein, Lysine, Serine, Glycine, Aspatamine, Threonine, Glutamine and Alanine
might have reduced the carbon amount.
The mRNA sequences of different proteins were analyzed for base
composition as shown in Figure 6.13. The numbers are denotes by the following 1-
HA, 2-NA, 3- NP, 4-M1, 5-M2, 6-NS1, 7-NEP, 8-PA, 9-PB1 and 10- PB2 and can
see the highest content of adenine in all the mRNAs in Figure 6.13. Different
combination of mRNAs can be selected for normal protein synthesis based on
thymine distribution. The mutational study on any site of interest can be carried
out by CARd program. One example has been illustrated here. Mutation of Valine
82
with serine at position 715 of PB1 protein has been carried out. The comparison
plots are given in Figure 6.14. When the distribution was normal and centered at
0.3145, then the stretch had a normal carbon distribution.
83
Figure 6.8 Carbon distribution profile of matrix protein of Sendai virus
27.45
28.45
29.45
30.45
31.45
32.45
33.45
25 75 125 175 225 275 325
Car
bo
n %
Residue Number
84
Figure 6.9 Carbon distribution profile of matrix protein of Vesicular
stomatitis Indiana virus
29.45
30.45
31.45
32.45
33.45
34.45
35.45
25 75 125 175 225
Car
bo
n %
Residue Number
85
Figure 6.10 Carbon distribution pattern of matrix protein of Human
respiratory syncytial virus A
.
28.45
29.45
30.45
31.45
32.45
33.45
34.45
35.45
26 76 126 176 226
Car
bo
n %
Residue number
86
Figure 6.11 Carbon distribution pattern in matrix protein of
Zaire ebola virus
28.45
29.45
30.45
31.45
32.45
33.45
34.45
35.45
24 74 124 174 224 274 324
Car
bo
n %
Residue number
87
Figure 6.12 Carbon distribution profile of matrix protein of Nipah virus
28.45
29.45
30.45
31.45
32.45
33.45
34.45
35.45
36.45
20 70 120 170 220 270 320
Car
bo
n %
Residue number
88
Figure 6.13(a).Average ACGT composition in different mRNAs of human
Influenza A virus (b). Average base composition in entire viral mRNA.
89
Figure 6.14. Comparison of carbon distribution profile at position 715 in native
and mutated (V715S) forms of PB1 protein. X-axis shows the carbon fraction and
the Y-axis shows frequency.
90
Shift in left side means it is hydrophilic in nature and the right shift means
hydrophobic. Oscillation from normal distribution is also considered as abnormal
carbon distribution.
Here, the native protein shows a normal and no waver. The maximum is at left
side, meaning hydrophilic in nature. In mutant protein, it is waver but maximum at
0.3145. It is balancing one way or the other. Hence, according to the plot (Figure
6.14), there is not much change in the mutational effects due to carbon
distribution. This is in agreement with the experimental report that the mutation
was not significant (Sugiyama et al., 2009). This kind of mutational study can be
carried out to bring the protein into normal.
6.5 CONCLUSION
The carbon distribution studies on viral proteins revealed that the carbon
content and distribution along the sequences were vital for function. A difference
in carbon distribution pattern was noticed in most of the H1N1 proteins. The
carbon content and distribution plays a role in evolution of proteins and the
difference in carbon distribution in proteins cause diseases. The carbon distribution
study along the protein chain is the most significant step towards understanding
the biological reactions.
Higher number of adenine in mRNAs was noticed to be playing an important
role in producing proteins with higher number of large hydrophobic residues
which was responsible for richness in carbon content. Further analysis of different
viral genome revealed that the presence of higher adenine content might add
appropriate number of large hydrophobic residues in the proteins. In particular, the
adenine as the second nucleotide in frame 1 was important to have an appropriate
uracil as the second nucleotide in frame 4 in back chain. Framing these adenine
(uracil at the back chain) is important for producing highly functional and long
lasting proteins that will be passed on to the next generation. On the other hand, in
91
human mRNAs, the adenine content is less that needs to be worked out for disease
free living. This is the classical example of adding more hydrophobic proteins to
the host cell naturally.
Apart from large hydrophobic residues, there are other residues which can
contribute to the total carbon content of a protein for floating which needs to be
taken into account. The analyses of mRNA sequences of Influenza A virus revealed
that the adenine content was higher in all sequences. Further, thymine distributions
in different frames were checked. Most important observation of excess thymine in
frame 4 of strand 2 was observed, which might be responsible for production of
proteins with different amino acid compositions. Unusual thymine distributions in
frame 3 were also observed. The thymine distributions were different in viral
mRNAs compared to animals and minimizing this excess thymine may give
normal protein.
The mutational study on any site of interest for protein stabilization was also
carried out (Figure 6.14). This technique can be better exploited for further
improving the protein stability, activity and ultimately for gene therapy. The viral
infection techniques demonstrate that the addition of CpG Island in human genome
can be altered by introducing mRNAs for production of proteins with adequate
carbon content and distribution. CARd program can be utilized for adding
appropriate proteins.