chapter 6 role of thymine in manipulation of...

62

CHAPTER 6

ROLE OF THYMINE IN MANIPULATION OF CARBON IN

DIFFERENT PROTEINS

6.1 INTRODUCTION

Viruses cause diseases from common cold to acquired immunodeficiency

syndrome (AIDS). Viruses are genetic material, either RNA or DNA, coated by

proteins. The genetic material has instruction for its multiplication and the

infected virus instructs the host cell to duplicate it. Sometime or the other, all

are infected by these viruses. Matrix proteins are understandably responsible for

virus assembly and budding and these structural proteins link the viral envelope

and the core. These proteins are proved to be responsible for expelling the

genetic material, once the virus entered into the host cell (Mebatsion et al.,

1999).

The matrix proteins of different viruses were analyzed for distribution of

carbon content and large hydrophobic residues. The amount and distribution of

adenine in viral genome is a factor in deciding the protein stability from

hydrophobicity point of view. The genetic material of different viral genome

and different segments of viruses were analyzed in this study.

Viruses are classified depending on the nucleic acid constituents. Influenza

viruses are negative single stranded RNA used for synthesizing mRNAs.

Influenza A viruses are pandemic due to sudden mutation/variation in surface

proteins. There are records of evidence that the Influenza A virus may mutate

into a form that can be transmitted to human easily. The mutations lead to

different forms of surface proteins that form different structure. The carbon

63

content and distribution leads to formation of these many structures on mutation

(Rajasekaran et al., 2012).

Paramyxoviruses are enveloped viruses that replicate in the cytoplasm and

contain a single stranded negative-sense RNA genome. The genome contains 6

primary genes: nucleocapsid (N), phosphoprotein (P), matrix (M), the fusion

(F) and attachment (HN, H, or G) proteins, and the polymerase (L), along with

accessory proteins that vary with the viral species (Lamb, 2001). Enveloped

viruses obtain their envelope from cellular membranes once the components of

the virus have assembled at the lipid bilayer. The assembly process brings

together the glycoproteins spanning the lipid bilayer with the inner core of the

virus particle. The inner layer of the membrane consists of a viral protein that

bridges the glycoproteins and the inner core, dubbed the matrix or M protein. M

is an essential protein, without which the production of virus particle is highly

impaired if not impossible.

The M protein of Sendai virus (SeV-M), a member of the Paramyxovirinae

subfamily, Paramyxoviridae family is produced in the cytoplasm and self-

associates to form a leaflet at the inner face of the plasma membrane (Takimoto

and Portner, 2004). In the virus particle, it carpets the inner part of the viral

envelope, where it interacts with the two surface glycoproteins, HN and F, on

the one hand and with the viral ribonucleoprotein complex (N protein plus viral

RNA) associated with the L and P proteins on the other hand (Lamb, 2001).

Furthermore, paramyxovirus M has been reported to participate in the

regulation of RNA synthesis (Kras'ko et al., 1986; Suryanarayana et al., 1994;

Ghildyal et al., 2003; Reuter et al., 2006). Human respiratory syncytial virus

(RSV) was first isolated in 1956 from a laboratory Chimpanzee with upper

respiratory tract disease (Hall, 2001; McNamara and Smyth, 2002; Taylor,

2007; Collins and Crowe, 2007). RSV was quickly determined to be of human

64

origin and was studied to be the leading worldwide viral agent of pediatric

respiratory tract disease. RSV (family Paramyxoviridae, order

Mononegavirales) is an enveloped virus with a single-stranded antisense RNA

genome of 15.2 kb (Collins and Crowe, 2007).

The animal versions of RSV include bovine RSV (BRSV) and pneumonia

virus of mice (PVM), suggesting that species jumping took place during the

evolution of these viruses. RSV proteins form the viral envelope by associating

with the lipids in the membrane (Collins and Crowe, 2007). The matrix M

protein lines the inner envelope surface and is vital in virions morphogenesis

(Teng and Collins, 1998). The heavily glycosylated G, fusion F, and small

hydrophobic SH proteins are transmembrane surface glycoproteins.

Since its identification in 1998 to 1999, Nipah virus (NiV), a member of

the family Paramyxoviridae, is widely associated with numerous outbreaks of

fatal viral encephalitis in humans in Southeast Asia (Chua et al., 2000).The

vesicular stomatitis virus (VSV) belongs to the Rhabdoviridiae family and

causes acute infections in a wide range of mammalian hosts (Wagner and Rose,

1995). Its matrix protein (VSV M) plays a key role in viral assembly and

budding and also in the inhibition of host cell gene expression during infection.

During its expression in vertebrate cells, in the absence of any other viral

protein, VSV M produces dramatic alterations in cellular RNA metabolism and

mRNA expression (Black and Lyles, 1992; Ahmed and Lyles, 1998).

The matrix (M) protein of rhabdoviruses plays a key role in virus assembly

and budding, however, the precise mechanism by which M mediates these

processes is still unclear. A highly conserved, proline-rich motif (PPxY or PY

motif, where P represents proline, Y denotes tyrosine, and x denotes any amino

acid) of rhabdoviral M proteins has been associated with a possible role in

budding mediated by the M protein. Point mutations that disrupt the PY motif

65

of the M protein of vesicular stomatitis virus (VSV) revealed no obvious effect

on membrane localization of M, however, lead to a decrease in the amount of M

protein released from cells in a functional budding assay. Moreover, the PPxY

sequence within rhabdoviral M proteins is identical to that of the ligand which

interacts with WW domains of cellular proteins. Amino acids 17 through 33

and 29 through 44, which contain the PY motifs of VSV and rabies virus M

proteins, mediate interactions with WW domains of specific cellular proteins.

Point mutations that disrupt the consensus PY motif of VSV or rabies virus M

protein result in a significant decrease in their ability to interact with the WW

domains. These properties of the PY motif of rhabdovirus M proteins are

strikingly similar to those of the late (L) budding domain in the gag-specific

protein p2b of Rous sarcoma virus (Harty et al., 1999).

The Ebola virus is a member of the Filoviridae family of negative-sense

RNA viruses (Colebunders et al., 2000; Takada et al., 2001). Ebola virus

infects both primates and humans and mostly leads to severe hemorrhagic

fever, with high mortality rates of upto 90%. Currently, no approved vaccines

or antiviral drugs are available to prevent and/or treat Ebola virus infections

(Bray et al., 2002). The VP24 protein of Ebola virus is the secondary matrix

protein and minor component of virions, whereas, the VP40 protein of Ebola

virus is the principle matrix protein and the most abundant virions component.

The structure and function of VP40 have been well characterized. The C-

terminal domain of VP40 contains large hydrophobic patches that may be

involved in the interaction with the lipid bilayers (Han et al., 2003).

Chikungunya virus (CHIKV) produces a dengue-like illness in humans,

characterized by fever, rashes, and severe arthralgia persisting for a few weeks

to several months. CHIKV is an alphavirus of the family Togaviridae and it

contains a genome of a linear, positive-sense, single stranded RNA of

approximately 11.8 kb (Schuffenecker et al., 2006).

http://www.ncbi.nlm.nih.gov/pubmed?term=Han%20Z%5BAuthor%5D&cauthor=true&cauthor_uid=12525613

66

Dengue virus (DENV) is a mosquito-borne, single-stranded, positive (+)-

sense RNA virus of genome size 10.27 kb belonging to

the Flaviviridae family, whose members are responsible for diseases such as

yellow fever, Japanese encephalitis, tick-borne encephalitis, dengue fever and a

dengue hemorrhagic fever. Organization of the Dengue virus (DENV) genome

represents an example of intra-genomic long distance interactions that modulate

molecular processes. The DENV genome encodes a single-open reading frame,

flanked by highly structured untranslated regions (UTRs) (Villordo et al., 2009;

Filomatori et al., 2011; Manzano et al., 2011). The secondary structure of

DENV UTRs has attracted attention, as they have been shown to encompass

motifs involved in regulation of translation, replication, transcription and viral

pathogenesis (Pijlman et al., 2008; Silva et al., 2010; Manzano et al., 2011).

Rabies is an acute, central nervous system infection, characterized by CNS

irritation, followed by paralysis and death. Rabies is caused by the

virus Neurotropic lyssavirus, a member of the Rhabdovirus family. It is a

single-stranded, neurotropic, negative-sense RNA virus of genome size 11 kb

which encodes 5 proteins: a glycoprotein, a nucleoprotein, and three others

(Wunner, 1991).

6.2 METHODOLOGY

The genome and protein sequences of Influenza A virus H1N1 were

retrieved from NCBI (on 01-10-2011). The amino acid compositions of 11

proteins were calculated using an in house program (AACOMP). The carbon

distributions in these 11 proteins were computed using our CARBANA

program available online (www.rajasekaran.net.in/tools/carbana.html), which

uses the principle of 31.45% of carbon. A length of 700 atoms was selected for

calculation. The genome of Influenza A H1N1 contains 8 segments. The

number of bases in each segment was counted and tabulated. The role of uracil

67

in mRNA sequence of matrix proteins was investigated by counting the number

of large hydrophobic residues (FILMV), coded by XUX (X=A, U, G or C). The

matrix protein sequences of different viruses were obtained from SWISSPROT

database (on 05-09-2011).

The ATGC content of viral genomes of Chikungunya, Dengue, and Rabies

was calculated. The mRNA sequences of different proteins were analyzed for

base composition and the average base composition was also calculated. The

thymine distribution in different reading frames was computed separately, as it

is important for the production of proteins with adequate large hydrophobic

residues. Mutational study based on carbon distribution was carried out at site

V715 of PB1 protein. The CARd program (Rajasekaran, 2012) was used for the

study with parameters of 255 atoms (~17 amino acid) as outer length and 35

atoms as inner length. The results were plotted for comparison of native and

mutational forms.

6.3 DATASETS

The matrix protein sequences of Sendai virus (P06446), Vesicular

stomatitis Indiana virus (P03519), Influenza A virus (P05777), Human

respiratory syncytial virus (P03419), Zaire ebolavirus (Q05128) and Nipah

virus (Q9IK90) were collected from SWISSPROT protein database (on 05-09-

2011). The viral genome sequences of Chikungunya, Dengue and Rabies were

downloaded (on 01-10-2011) from NCBI and their ATGC were counted.

6.4 RESULTS AND DISCUSSION

Carbon is the only element that contributes towards the dominant force,

hydrophobic interaction in proteins. Proteins evolve based on carbon content

and may influence the coding of genes. It is reported that proteins prefer to have

a 31.45% of carbon for their stability (Rajasekaran et al., 2009). Depending

68

upon the carbon content, the protein and the corresponding mRNA survive and

are passed on to the next generation. The carbon content is determined by the

presence of different types of amino acids and the arrangement of these amino

acids is instructed in the gene.

The adenine in gene is transcribed as uracil in mRNA. The uracil in mRNA

is responsible for the number of large hydrophobic residues in proteins and the

number of uracil in mRNAs is decreased during evolution. Due to this reason,

the number of large hydrophobic residues decreases which causes the

production of proteins with less carbon contents, changing the proteins non-

functional. On the other side, the viruses are stitched into the host cell that

produces proteins with high carbon content. This is again possible because of

adenine in viral genetic materials get transcribed into uracil in mRNAs.

The Influenza A virus H1N1 genome contains 11 genes with eight

segments of RNAs, encoding for 11 proteins namely PB2, Polymerase 1(PB1),

PB1-F2, Polymerase (PA), Haemagglutinin (HA), Nucleocapsid protein (NP),

Neuraminidase(NA),Matrix proteins(M2, M1), Nonstructural proteins(NS2,

NS1). List of proteins is taken for carbon distribution study is tabulated with

accession number in Table 6.3. The results on carbon distribution in these

proteins have been shown in Figures 6.1-6.7. PB1-F2 and M2 are very small

proteins, which have not been shown here.

69

Figure 6.1 Carbon distribution pattern of HA protein

27.1

28.55

30

31.45

32.9

34.35

35.8

46 146 246 346 446 546 646

Car

bo

n %

Residue Number

70

Figure 6.2 Carbon distribution pattern of NA protein

28.55

30

31.45

32.9

34.35

35.8

45 145 245 345 445 545

Car

bo

n %

Residue NUmber

71

Figure 6.3 Carbon profile of NP protein

27.1

28.55

30

31.45

32.9

34.35

44 144 244 344 444 544

Car

bo

n %

Residue Number

72

Figure 6.4 Carbon distribution pattern in PA protein

25.65

27.1

28.55

30

31.45

32.9

34.35

35.8

44 144 244 344 444 544 644 744

Car

bo

n %

Residue Number

73

Figure 6.5 Carbon profile of PB1 protein

25.65

27.1

28.55

30

31.45

32.9

34.35

35.8

37.25

38.7

43 143 243 343 443 543 643 743 843

Car

bo

n %

Residue Number

74

Figure 6.6 Carbon distribution pattern in PB2 protein

27.1

28.55

30

31.45

32.9

34.35

43 143 243 343 443 543 643 743 843

Car

bo

n %

Residue Number

75

Figure 6.7 Carbon distribution profile of M1 protein

27.1

28.55

30

31.45

32.9

34.35

43 93 143 193 243 293

Car

bo

n %

Residue Number

76

Plot of atom number (X-axis) versus % of carbon (Y-axis) has been

performed to demonstrate the carbon distribution in proteins of Influenza A H1N1

virus. The two membrane proteins (HA and NA) clearly showed a rich carbon

content all along the sequence. Normally, a threshold value of 31.45% is expected

along the sequences. Hydrophobic region at 120-240 is shown above the line of

31.45% in Figure 6.1 and Figure 6.2. First half of the sequence is found out to be

hydrophilic while next half is predicted to be hydrophobic.

NP is a nucleoprotein which wraps around genomic RNA forming a

ribonucleoprotein complex and this protein contains a mix of both hydrophobic

and hydrophilic regions along the sequence. A long stretch of hydrophilic region

calculated at 200-350 in Figure 6.3. This is predicted to be unfolding region in the

protein. The PA and PB1, the viral RNA polymerase proteins showed a normal

carbon distribution while, PA protein had slightly higher carbon content (33-

35%).The results are shown in Figure 6.4 and 6.5. The PB2 protein shows a

periodic repeats of hydrophobic and hydrophilic stretches shown in Figure 6.6.

The two small non-structural proteins, NS1 and NS2 showed a normal carbon

content though not significant.

The matrix protein, M1 displayed a less amount of carbon (28%) due to the

presence of hydrophilic regions shown in Figure 6.7. The matrix proteins and

Haemagglutinin proteins play an important role in the viral entry into the host cell.

Hence, this carbon distribution study will certainly help to modify these proteins

for efficient functioning.

77

Table 6.1 Nucleotide content in different segments of Influenza A H1N1 virus

Segment

No.

Sequence

ID

No. of proteins

(Gene)

A T G C Total

1 60484 1 (PB2) 786 531 597 427 2341

2 324897 2 (PB1, PB1-F2) 811 545 526 459 2341

3 60808 1 (PA) 751 545 521 416 2233

4 62290 1 (HA) 621 417 409 331 1778

5 324709 1 (NP) 504 335 412 314 1565

6 324507 1 (NA) 428 381 351 253 1413

7 60788 2 (M2, M1) 294 248 268 217 1027

8 324833 2 (NS2, NS1) 285 211 215 179 890

78

Table 6.2 Nucleotide content in different viruses

Name of the

virus

Sequence

ID A T G C Total

Chikungunya 156751972 3492 2385 2968 2950 11795

Dengue type 1 14485523 3439 2298 2765 2219 10721

Rabies 9627197 3421 3130 2736 2645 11932

79

Table 6.3 List of human Influenza A virus proteins taken for carbon distribution

S. No. Proteins Name of Proteins (ID) Length

1 RNA Polymerase Basic 2 PB2(NP_040987.1) 759

2 RNA Polymerase Basic 1 PB1(NP_040985.1) 757

3 Polymerase Acidic PA(NP_040986.1) 716

4 Haemagglutinin HA(NP_040980.1) 566

5 Nucleoprotein NP(NP_040982.1) 498

6 Neuraminidase NA(NP_040981.1) 454

7 Matrix Proteins M1(NP_040978.1)

&

M2(NP_040979.2)

252

&

97

8 Non-Structural Proteins NS1(NP_040984.1)

&

NS2(NP_040983.1)

230

&

121

80

The genomic particulars of the InfluenzaA H1N1 virus have been given in

Table 6.1. Number of proteins and their gene identifiers has been given in the

second column. It was noted that the number of adenine was higher in all

segments. Because of this high adenine content, the host cell produces mRNA with

higher amount of uracil that is responsible for producing proteins with higher

number of large hydrophobic residues (including F, I, L, M and V), which will

make the protein to have a higher carbon content.

Analysis of mRNA sequences reveals that the presence of higher adenine

content might add an appropriate number of large hydrophobic residues in the

proteins. In particular, the adenine as the second nucleotide in frame 1 is important

to have an appropriate uracil as the second nucleotide in frame 4 in back chain.

Framing these adenine (uracil at the back chain) is important for producing highly

functional and long lasting proteins that will be passed on to the next generation.

This is the classical example of adding more hydrophobic proteins to the host cell

naturally. In human mRNAs, the adenine content is less. The above principle can

be adapted for adding genomic content that will give adequate carbon distribution

in the proteins.

The carbon distribution studies on viral proteins revealed that the viral

proteins had higher carbon content. The atomic composition plays a role in

evolution of proteins. The carbon distribution along the protein chain was

dominated by the presence of uracil in frame 1 of its mRNA. The adenine in viral

genome is responsible for uracil in the corresponding mRNA.

The amount of adenine was always greater than other bases in all viruses as

shown in Table 6.2. Probably, nature has been adding more hydrophobic elements

in the proteins through viruses. All the large hydrophobic residues such as

Phenylalanine (F), Isoleucine (I), Leucine (L), Methionine (M) and Valine (V) are

coded by codons XUX (where X = A, U, G or C). Earlier investigation on this

codon distribution in mRNA sequences of different species revealed that, the

frame 1 preferred to maintain some degree of this codon for maintaining sufficient

81

hydrophobicity in its proteins. The matrix proteins of different species were

analyzed for the presence of XUX in frame 1i.e the number and percentage of

large hydrophobic residues were calculated. The presence of excess uracil in the

coding sequences was observed.

The carbon content of matrix proteins was further analyzed with

CARBANA tool. The results for matrix protein of different viruses have been

shown in Figures 6.8-6.12. Matrix protein of Sendai virus shows overall the

protein is hydrophilic in nature shown in Figure 6.8. Overall the protein is

hydrophobic in nature of Vesicular stomatitis Indiana virus matrix protein shown

in Figure 6.9. It might misfold. In Figure 6.10 shows long stretch (50-150

residues) of hydrophilic region predicted to be unfolded in nature. The Zaire ebola

virus matrix protein has 87-136 and 155-188 residues shows hydrophobic in

Figure 6.11. Nipah virus matrix protein shows even distribution along the line of

31.45% is observed. It is shown in Figure 6.12.Disorders are not found.

The carbon content in matrix protein was greater than the expected value of

31.45% in entire sequences except in few places. The first half of the sequence

appeared to have higher carbon content than the next half. The number of large

hydrophobic residues in the first half was 55 and the other half had 75. Residues

like Tryptophan, Tyrosine, Proline, Histidine, Glutamate and Aspartate might also

be contributing to the carbon content. Similarly, residues such as Arginine,

Cystein, Lysine, Serine, Glycine, Aspatamine, Threonine, Glutamine and Alanine

might have reduced the carbon amount.

The mRNA sequences of different proteins were analyzed for base

composition as shown in Figure 6.13. The numbers are denotes by the following 1-

HA, 2-NA, 3- NP, 4-M1, 5-M2, 6-NS1, 7-NEP, 8-PA, 9-PB1 and 10- PB2 and can

see the highest content of adenine in all the mRNAs in Figure 6.13. Different

combination of mRNAs can be selected for normal protein synthesis based on

thymine distribution. The mutational study on any site of interest can be carried

out by CARd program. One example has been illustrated here. Mutation of Valine

82

with serine at position 715 of PB1 protein has been carried out. The comparison

plots are given in Figure 6.14. When the distribution was normal and centered at

0.3145, then the stretch had a normal carbon distribution.

83

Figure 6.8 Carbon distribution profile of matrix protein of Sendai virus

27.45

28.45

29.45

30.45

31.45

32.45

33.45

25 75 125 175 225 275 325

Car

bo

n %

Residue Number

84

Figure 6.9 Carbon distribution profile of matrix protein of Vesicular

stomatitis Indiana virus

29.45

30.45

31.45

32.45

33.45

34.45

35.45

25 75 125 175 225

Car

bo

n %

Residue Number

85

Figure 6.10 Carbon distribution pattern of matrix protein of Human

respiratory syncytial virus A

.

28.45

29.45

30.45

31.45

32.45

33.45

34.45

35.45

26 76 126 176 226

Car

bo

n %

Residue number

86

Figure 6.11 Carbon distribution pattern in matrix protein of

Zaire ebola virus

28.45

29.45

30.45

31.45

32.45

33.45

34.45

35.45

24 74 124 174 224 274 324

Car

bo

n %

Residue number

87

Figure 6.12 Carbon distribution profile of matrix protein of Nipah virus

28.45

29.45

30.45

31.45

32.45

33.45

34.45

35.45

36.45

20 70 120 170 220 270 320

Car

bo

n %

Residue number

88

Figure 6.13(a).Average ACGT composition in different mRNAs of human

Influenza A virus (b). Average base composition in entire viral mRNA.

89

Figure 6.14. Comparison of carbon distribution profile at position 715 in native

and mutated (V715S) forms of PB1 protein. X-axis shows the carbon fraction and

the Y-axis shows frequency.

90

Shift in left side means it is hydrophilic in nature and the right shift means

hydrophobic. Oscillation from normal distribution is also considered as abnormal

carbon distribution.

Here, the native protein shows a normal and no waver. The maximum is at left

side, meaning hydrophilic in nature. In mutant protein, it is waver but maximum at

0.3145. It is balancing one way or the other. Hence, according to the plot (Figure

6.14), there is not much change in the mutational effects due to carbon

distribution. This is in agreement with the experimental report that the mutation

was not significant (Sugiyama et al., 2009). This kind of mutational study can be

carried out to bring the protein into normal.

6.5 CONCLUSION

The carbon distribution studies on viral proteins revealed that the carbon

content and distribution along the sequences were vital for function. A difference

in carbon distribution pattern was noticed in most of the H1N1 proteins. The

carbon content and distribution plays a role in evolution of proteins and the

difference in carbon distribution in proteins cause diseases. The carbon distribution

study along the protein chain is the most significant step towards understanding

the biological reactions.

Higher number of adenine in mRNAs was noticed to be playing an important

role in producing proteins with higher number of large hydrophobic residues

which was responsible for richness in carbon content. Further analysis of different

viral genome revealed that the presence of higher adenine content might add

appropriate number of large hydrophobic residues in the proteins. In particular, the

adenine as the second nucleotide in frame 1 was important to have an appropriate

uracil as the second nucleotide in frame 4 in back chain. Framing these adenine

(uracil at the back chain) is important for producing highly functional and long

lasting proteins that will be passed on to the next generation. On the other hand, in

91

human mRNAs, the adenine content is less that needs to be worked out for disease

free living. This is the classical example of adding more hydrophobic proteins to

the host cell naturally.

Apart from large hydrophobic residues, there are other residues which can

contribute to the total carbon content of a protein for floating which needs to be

taken into account. The analyses of mRNA sequences of Influenza A virus revealed

that the adenine content was higher in all sequences. Further, thymine distributions

in different frames were checked. Most important observation of excess thymine in

frame 4 of strand 2 was observed, which might be responsible for production of

proteins with different amino acid compositions. Unusual thymine distributions in

frame 3 were also observed. The thymine distributions were different in viral

mRNAs compared to animals and minimizing this excess thymine may give

normal protein.

The mutational study on any site of interest for protein stabilization was also

carried out (Figure 6.14). This technique can be better exploited for further

improving the protein stability, activity and ultimately for gene therapy. The viral

infection techniques demonstrate that the addition of CpG Island in human genome

can be altered by introducing mRNAs for production of proteins with adequate

carbon content and distribution. CARd program can be utilized for adding

appropriate proteins.

chapter 6 role of thymine in manipulation of...

Documents