a comparative computational study of the ‘rbcl’ gene in ... · a comparative computational...

9
Indian Journal of Biotechnology Vol 12, January 2013, pp 58-66 A comparative computational study of the ‘rbcL’ gene in plants and in the three prokaryotic families—Archaea, cyanobacteria and proteobacteria Sunil Kanti Mondal 1,2# , Subhadeep Shit 3# and Sudip Kundu 1 * 1 Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata 700 009, India 2 Department of Biotechnology, The University of Burdwan, Rajbati, Burdwan 713 104, India 3 School of Biotechnology and Life Sciences, Haldia Institute of Technology, Haldia 721 657, India The rbcL (ribulose-1,5-biphosphate carboxylase oxygenase) gene plays a crucial role in carbon fixation. Previous studies shed light on its evolutionary relationship among different Phyla. Here, authors have done a comparative study of rbcL genes among proteobacteria, archaea, cyanobacteria and plants based on their compositional variations (GC%, amino acid frequency, codon usage, etc.). In addition we have checked the mutational pressure on rbcL genes. The results indicate that the rbcL genes of cyanobacteria have a wide range of GC%. On the other hand, those of the proteobacteria have mainly higher GC%. Preferences of some amino acids usages have observed in rbcL genes among all species with an exception of plant. Analysis of RSCU (relative synonymous codon usage) values depicts GC ending codon biasness in proteobacteria, archaea and with few exceptions in cyanobacterial species. On the other hand, AU ending codon biasness has been observed for plants. The correspondence analysis shows the significant difference in codon usage pattern among the selected four groups of species. The ENc (expected effective number of codons) plot implys the choice of rbcL gene codon is constrained only by mutational biasness. Moreover, the rbcL genes’ expression ability, as predicted by CAI (codon adaptation index), is similar for most of the species from different groups. Keywords: Codon adaptation index (CAI), expected effective number of codons (ENc), ribulose-1,5-biphosphate carboxylase oxygenase, relative synonymous codon usage (RSCU) Introduction Ribulose-1,5-biphosphate carboxylase-oxygenase (E.C. 4.1.1.39, Rubisco), one of the most abundant enzymes on the globe 1 , is responsible for catalyzing CO 2 assimilation to organic carbon via Calvin cycle. In its most prevalent conformation, found in most proteobacteria, cyanobacteria, algae and higher plants, Rubisco occurs as a hexadecamer composing of eight large subunit (rbcL, MW 56,000) and eight small subunits (rbcS, MW 14,000), assembling into an L 8 S 8 holoenzyme 2 . The large subunit plays the crucial role of carbon fixation 3 . In Cyanobacteria and other prokaryotes, genes for both rbcL and rbcS are chromosomally encoded and co-transcribed 2 . As evolutionary rate of rbcL is suitable for study of phylogeny, it is often used as model for phylogenetic investigation 4 . Understanding of these evolution patterns may shed light on functional/structural features governing Rubisco activity. Previous investigations on rbcL genes suggest a biphyletic origin of phototrophic eukaryotes 5-7 . It is also found that rbcL of green algae/plant lineages is derived from cyanobacteria and forms one rbcL lineage; the second rbcL lineage consists of the non-green algae and is derived from proteobacteria. This scenario of evolution is also well supported by molecular phylogenies of other chloroplast-encoded genes like psbA, tufA 5,8 , atpB, ClpC 9,10 , etc. Phylogenies based on these genes suggest that there is a single cyanobacterial ancestor of plastids. These resulted in a number of hypotheses explaining the apparently contradictory results. For example, (a) a lateral gene transfer of rbcLS genes may have occurred from a proteobacteria into the ancestor that gave rise to non-green plants; (b) a lateral transfer of rbcLS operon may have occurred into cyanobacterial ancestor that gave rise to non-green plants; or (c) two rbcLS operons may have been present in cyanobacterial-like ancestor (that gave rise to plastids) and different copies were retained in green versus nongreen lineages 11 . While a large number of investigations have already done to understand the ancestry of Rubisco —————— *Author for correspondence: Mobile: +91-9433428324 E-mail: [email protected]; [email protected] # Both authors have equal contribution

Upload: doanmien

Post on 29-Aug-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

Indian Journal of Biotechnology

Vol 12, January 2013, pp 58-66

A comparative computational study of the ‘rbcL’ gene in plants and in the

three prokaryotic families—Archaea, cyanobacteria and proteobacteria

Sunil Kanti Mondal1,2#

, Subhadeep Shit3#

and Sudip Kundu1*

1Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata 700 009, India

2Department of Biotechnology, The University of Burdwan, Rajbati, Burdwan 713 104, India

3School of Biotechnology and Life Sciences, Haldia Institute of Technology, Haldia 721 657, India

The rbcL (ribulose-1,5-biphosphate carboxylase oxygenase) gene plays a crucial role in carbon fixation. Previous

studies shed light on its evolutionary relationship among different Phyla. Here, authors have done a comparative study of

rbcL genes among proteobacteria, archaea, cyanobacteria and plants based on their compositional variations (GC%, amino

acid frequency, codon usage, etc.). In addition we have checked the mutational pressure on rbcL genes. The results indicate

that the rbcL genes of cyanobacteria have a wide range of GC%. On the other hand, those of the proteobacteria have mainly

higher GC%. Preferences of some amino acids usages have observed in rbcL genes among all species with an exception

of plant. Analysis of RSCU (relative synonymous codon usage) values depicts GC ending codon biasness in proteobacteria,

archaea and with few exceptions in cyanobacterial species. On the other hand, AU ending codon biasness has been observed

for plants. The correspondence analysis shows the significant difference in codon usage pattern among the selected four

groups of species. The ENc (expected effective number of codons) plot implys the choice of rbcL gene codon is constrained

only by mutational biasness. Moreover, the rbcL genes’ expression ability, as predicted by CAI (codon adaptation index), is

similar for most of the species from different groups.

Keywords: Codon adaptation index (CAI), expected effective number of codons (ENc), ribulose-1,5-biphosphate carboxylase

oxygenase, relative synonymous codon usage (RSCU)

Introduction Ribulose-1,5-biphosphate carboxylase-oxygenase

(E.C. 4.1.1.39, Rubisco), one of the most abundant

enzymes on the globe1, is responsible for catalyzing

CO2 assimilation to organic carbon via Calvin cycle.

In its most prevalent conformation, found in most

proteobacteria, cyanobacteria, algae and higher plants,

Rubisco occurs as a hexadecamer composing of eight

large subunit (rbcL, MW ≈ 56,000) and eight small

subunits (rbcS, MW ≈ 14,000), assembling into an

L8S8 holoenzyme2. The large subunit plays the crucial

role of carbon fixation3. In Cyanobacteria and other

prokaryotes, genes for both rbcL and rbcS are

chromosomally encoded and co-transcribed2. As

evolutionary rate of rbcL is suitable for study of

phylogeny, it is often used as model for phylogenetic

investigation4. Understanding of these evolution

patterns may shed light on functional/structural

features governing Rubisco activity.

Previous investigations on rbcL genes suggest a

biphyletic origin of phototrophic eukaryotes5-7

. It is

also found that rbcL of green algae/plant lineages is

derived from cyanobacteria and forms one rbcL

lineage; the second rbcL lineage consists of the

non-green algae and is derived from proteobacteria.

This scenario of evolution is also well supported by

molecular phylogenies of other chloroplast-encoded

genes like psbA, tufA5,8

, atpB, ClpC9,10

, etc.

Phylogenies based on these genes suggest that there is

a single cyanobacterial ancestor of plastids. These

resulted in a number of hypotheses explaining the

apparently contradictory results. For example, (a) a

lateral gene transfer of rbcLS genes may have occurred

from a proteobacteria into the ancestor that gave rise to

non-green plants; (b) a lateral transfer of rbcLS operon

may have occurred into cyanobacterial ancestor that

gave rise to non-green plants; or (c) two rbcLS operons

may have been present in cyanobacterial-like ancestor

(that gave rise to plastids) and different copies were

retained in green versus nongreen lineages11

.

While a large number of investigations have

already done to understand the ancestry of Rubisco

——————

*Author for correspondence:

Mobile: +91-9433428324

E-mail: [email protected]; [email protected] #Both authors have equal contribution

Page 2: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

MONDAL et al: A COMPARATIVE COMPUTATIONAL STUDY OF rbcL GENES

59

and its evolutionary relationship among different

Phyla, present study deals with the rbcL gene’s

evolutionary pattern in prokaryotes (archaea,

proteobacteria, cyanobacteria) and eukaryotes (plant);

an extensive analysis on the compositional variation

and similarity (GC content, amino acid usage

frequency, codon bias, etc.) of the rbcL genes;

and also to understand the mutational pressure on

rbcL genes.

Materials and Methods

Collection of Data

Taxonomic and other related information, whole

genome and 16S rRNA sequences of 43 species

(10 Archaea spp., 10 Cyanobacteria spp., 15

Proteobacteria spp. & 8 Plant spp.) were collected

from NCBI (http://www.ncbi.nlm.nih.gov). Annotated

rbcL gene and protein sequences were collected from

the KEGG database (http://www.genome.jp/dbget-bin/).

Evolutionary Analysis

Bootstrapped trees of 16S rRNA and rbcL gene

sequences were generated using ClustalX (version 2)

and PHYLIP (version 3.69)12,13

. Hierarchical

clustering based on GC content was generated using

DIANA14

.

Compositional Analysis

Parameters like GC content, RSCU (a measure

of relative synonymous codon usage biasness)

values, amino acid and tRNA frequencies were

considered for compositional analysis. The GC, GC3s

and Nc values15

were generated using CodonW

(http://codonw.sourceforge.net/). These provided useful

information regarding existence of mutational

pressures acting on the genes16

. The expected

effective number of codons (ENc) values from

GC3s under H0 (null hypothesis, i.e., no selection)

were calculated according to Equation 1, where S

denoted GC3s.

ENc= 2+S+ {29/ [S2+ (1-S)

2]} … (1)

Correspondence analysis was performed using

CodonW to investigate major trend in RSCU variation

among the genes. It identifies major trends in the

variation of synonymous codon usage data and

distributes genes along continuous axes in accordance

with these trends. RSCU value close to 1 indicates

lack of biasness, while much higher and lower values

indicate preference and avoidance of that particular

codon, respectively.

Types of motifs in gene sequence were

calculated using the online tool MOTIFSCAN

(http://hits.isb-sib.ch/cgi-bin/PFSCAN) and were

aligned determining whether each of the genes are

conserved among groups.

Statistical Significance Test

A statistical significance (z) test was performed

based on G and C ending codons of amino acids

having minimum four degenerative codons with

variation only in the third position to understand the

preference of either codons on the respective

organisms group.

Z = (Xc - Xg)/√((Sc2+Sg

2)/(Nc+Ng) … (2)

Here, Xi and Si denote the average and standard

deviation of i (i=C & G) ended codon’s RSCU values,

respectively. N indicates the sample size.

Expressional Probability

Codon adaptation index (CAI), a measure of gene’s

probable expression, was calculated17

. The codon

preference model was also generated to check an

existence of mutational biasness as well as also to

understand the preferential effect over the different

amino acids.

Results and Discussion

Phylogenetic Analysis

Phylogenetic trees generated based on rbcL genes

and 16S rRNA sequences are shown in Fig. 1. The

main cluster of phylogenetic tree of the rbcL genes of

prokaryotes (Fig. 1a) has two sub clusters; one of

archaeal and the other of α- and β-proteobacterial

secies. It is joined by a cluster of γ-proteobacterial

and finally preceded by cyanobacterial species. The

phylogenetic tress of 16S rRNA sequences is

presented in Fig. 1b. Its main cluster comprises of

archaeal and cyanobacterial species as sub

clusters, which is followed by cluster of β- and

γ-proteobacterial, and of α-proteobacterial species.

Comparison of both phylogenetic trees shows that

rbcL genes of archaea and proteobacterial species

have more similarity. On the other hand, 16S rRNA

based tree clearly shows genetical similarity of

archaeal and cyanobacterial species. The higher

similarity among the rbcL genes of cyanobaceria and

plant species is evident from the phylogenetic tree of

both the prokaryotic and plant species (Fig. 1c), and

confirms about the evolutionary relationships as

already observed by several previous studies.

Page 3: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

INDIAN J BIOTECHNOL, JANUARY 2013

60

Fig. 1 (a-c)—(a) Phylogenetic tree based on the rbcL gene sequences; (b) Phylogenetic tree based on the 16SrRNA sequences of the

species; & (c) Phylogenetic tree of both prokaryotes and eukaryotes based on the rbcL gene sequences.

Page 4: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

MONDAL et al: A COMPARATIVE COMPUTATIONAL STUDY OF rbcL GENES

61

Compositional Variability

Variation in rbcL Gene

The hierarchical cluster based on GC content

of the rbcL genes is presented in Fig 2. It has

two main clusters. The first cluster forms two sub

clusters: Cluster A and B, while the other cluster

comprises of three sub clusters: Cluster C, D and E.

Cluster A contains plant, whereas cluster B contains

mainly cyanobacterial and few archaea species

(Staphylothermus marinus, Methanosarcina mazei &

Methanoculleus marisnigri). These two clusters have

species possesing rbcL genes of low GC content. The

cluster C represents species having comparatively

higher GC content (56.42%) in their rbcL genes and it

has species of different groups, such as, cyanobacteria

(Thermocynechococcus elongatus & Synechoccus sp.),

archaea (M. barkeri, Archaeoglobus fulgidus &

Thermococcus gammatolerens) and proteobacteria

(Acidithiobacillus ferroxidans & Nitrosomonas

eutropha). However, the species having highest GC

content in rbcL genes are present in cluster D and E,

comprising mainly of the proteobacteria species

and few cyanobacteria (Synechococcus sp. A-prime &

Synechoccus sp. RCC307) and archaea (Thermofilum

pendens & Natronomonas pharaonis). The results

indicate that the rbcL genes of cyanobacterial species

have a wide range of GC contents, whereas those of

the proteobacterial species have mainly higher GC

contents. It has been observed that the range of GC

content in green plants varies from 28 to 42%18

.

It is also evident from the present results that the

rbcL genes of plants have comparatively lower

GC contents.

The gene has also been analyzed based on the

physico-chemical properties of the amino acids

present in its sequence. Amino acid frequencies are

calculated and presented in three groups: amino acids

with hydrophobic side chains in cluster I, polar amino

acids with neutral side chains in cluster II and polar

amino acids with charged groups in cluster III

(Fig. 3). Amino acids like alanine, valine, isoleucine

and leucine from cluster I show higher amino acid

usage among all the four groups of species. Similarly,

amino acids like glycine, proline and threonine from

the cluster II show comparative higher usage ratio. On

the other hand, glycine and cystiene usages are

maximum and minimum, respectively among all the

four groups of species. All the polar amino acids with

charged groups, except histidine, (cluster III) show

comparatively higher usage ratio in all groups of

species. However, few exceptions are observed in a

plant species, namely, Nicotiana benthamiana; it

shows higher usage ratio of arginine (from cluster III),

cystein and serine (from cluster II).

An intra and inter group correlation check was

done based on the rbcL genes’ RSCU (Table 1) and

the results show that proteobacterial and archaeal

species have greater RSCU values in the GC-ending

Fig. 2—Hierarchical clustering based on the GC content of rbcL genes.

Page 5: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

INDIAN J BIOTECHNOL, JANUARY 2013

62

codons, indicating greater usage of these codons.

Cyanobacterial species also showed greater values for

GC-ending codons, except amino acids arginine,

alanine, valine, glutamic acid, glycine and lysine,

while plant specied showed greater RSCU values in

the AU-ending codons, except amino acids aspargine

and histidine.

Codon Preference Check

The rbcL genes’ RSCU values also lead to the

establishment of a codon preference model, which

thereby shows preference towards GC-ending codons

by the prokaryotes, signifying a mutational bias

towards G+C, but no such case is observed in the

plant species (Table 1). A preference check of G3

Fig. 3—Amino acid frequencies of rbcL genes obtained from archaea (AR), proteobacteria(PR), cyanobacteria (CY) and plant species (PL).

(afu = Archaeoglobus fulgidus; hbu = Hyperthermus butylicus; mem = Methanoculleus marisnigri; mac = Methanosarcina acetivorans;

mba = Methanosarcina barkeri; mma = Methanosarcina mazei; nph = Natronomonas pharaonis; smr = Staphylothermus marinus;

tga = Thermococcus gammatolerans; tpe = Thermofilum pendens; ana = Anabaena sp. PCC7120; cya = Cyanobacteria Yellowstone A-Prime;

cyb = Cyanobacteria Yellowstone B-Prime; cyc = Cyanothece; mar = Microcystis aeruginosa; pmb = Prochlorococcus marinus AS9601;

pmj = Prochlorococcus marinus MIT9211; pme = Prochlorococcus marinus NATL1A; syr = Synechococcus sp. RCC307;

tel = Thermosynechococcus elongates; ath = Arabidopsis thaliana; P01 = Beta vulgaris; P02 = Brassica napus; cre = Chlamydomonas

reinhardtii; P03 = Gossypium raimondii; P04 = Helianthus annus; P05 = Nicotiana benthamiana; P06 = Petunia hybrid; acr = Acidiphilium

cryptum; afr = Acidithiobacillus ferrooxidans; aeh = Alkalilimnicola ehrlichei; bbt = Bradyrhizobium; bph = Burkholderia phymatum;

cti = Cupriavidus taiwanensis; mca = Methylococcus capsulatus; net = Nitrosomonas eutropha; rsp = Rhodobacter sphaeroides 2.4.1;

rpe = Rhodopseudomonas palustris BisA53; rpa = Rhodopseudomonas palustris CGA009; rce = Rhodospirillum centenum;

sme = Sinorhizobium meliloti; tbd = Thiobacillus denitrificans; & xau = Xanthobacter autotrophicus)

Page 6: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

MONDAL et al: A COMPARATIVE COMPUTATIONAL STUDY OF rbcL GENES

63

over C3 or vice versa (Table 2) shows few significant

results; archaeal species show biasness of C3 over G3

in all codons, except arginine and proline. Among

cyanobacterial species, preference of G3 over C3 are

found in amino acids like leucine, while C3 over G3

are seen in alanine, glycine, proline, threonine and

serine. Among proteobacteria species, biasness

towards C3 ending codons is observed, except

amino acids leucine and proline. In plant species,

preference exists in AU-ending codons, except

aspargine and histidine showing a mild higher

RSCU values in C3 ending codons.

Statistical Significance Test

Some amino acids like alanine and proline show no

significant difference (P<0.05) between G and C

ending codons both in archaea and plant species, while

threonine shows significant difference in archaea,

cyanobacteria, proteobacteria and plant species.

Motif Analysis

The general sequence of the rbcL motif is “G-x-

[D]-F-x-K-x-D-E”. Degree of conservation using

motif analysis shows that rbcL motifs of the different

species are found almost conserved in all the species,

except N. eutropha, Brassica napus, N. benthamiana

Table 1—Summary of the RSCU values of rbcL genes among the four groups

Amino

acids

Codons* Archaea Cyano-

bacteria

Proteo-

bacteria

Plant Amino

acids

Codons* Archaea Cyano-

bacteria

Proteo-

bacteria

Plant

gca 1 0.71 0.24 1.18 Met aug 1 1 1 1

gcc 1.25 1.18 2.21 0.57 cca 0.88 0.64 0.19 1.25

gcg 0.88 0.55 1.32 0.46 ccc 1.02 1.42 1.22 0.51

Ala

gcu 0.87 1.58 0.22 1.79

Pro

ccg 1.44 0.59 2.34 0.53

ugc 1.78 1.16 1.73 0.9 ccu 0.67 1.35 0.26 1.71 Cys

ugu 0.23 0.85 0.27 1.1 Gln caa 0.56 0.97 0.39 1.45

gac 1.28 1.04 1.48 0.71 cag 1.66 1.03 1.81 0.55 Asp

gau 0.72 0.96 0.52 1.29 Arg aga 1.53 1.88 0.21 1.49

gaa 0.78 1.31 0.81 1.53 agg 2.51 0.46 0.28 0.87 Glu

gag 1.22 0.69 1.19 0.47 cga 0.34 0.35 0.39 1.01

uuc 1.29 1.22 1.89 0.84 cgc 0.88 1.85 4.09 0.55 Phe

uuu 0.72 0.78 0.2 1.16 cgg 2.28 1.61 1.09 0.49

gga 0.98 0.6 0.08 1 cgu 0.74 2.14 0.68 2.16

ggc 1.51 1.13 3.1 0.5 agc 1.99 0.39 1.04 0.87

ggg 0.91 0.35 0.28 0.73 agu 0.71 0.33 0.07 0.95

Gly

ggu 0.6 1.93 0.53 1.77

Ser

uca 0.6 0.75 0.11 1.38

cac 1.31 1.43 1.43 1.1 ucc 1.16 1.99 1.52 0.78 His

cau 0.69 0.57 0.57 0.9 ucg 0.99 0.63 2.93 0.35

aua 0.87 0.17 0.03 0.36 ucu 0.56 1.93 0.34 1.67

auc 1.41 1.86 2.77 1.19 aca 0.79 0.7 0.15 1.18

Ile

auu 0.73 0.97 0.21 1.45

Thr

acc 1.54 2.05 2.7 0.76

aaa 0.68 1.11 0.28 1.51 acg 0.91 0.31 0.95 0.22 Lys

aag 1.32 0.89 1.72 0.49 acu 0.76 0.94 0.21 1.85

uua 0.24 0.74 0.05 1.23 gua 0.72 0.8 0.15 1.62

uug 0.35 1.03 0.28 1.34 guc 1.33 0.7 1.79 0.27

cua 0.64 0.74 0.11 0.94 gug 0.72 0.93 1.88 0.58

cuc 1.99 1.17 2.03 0.4

Val

guu 1.23 1.57 0.18 1.54

cug 1.39 1.52 3.35 0.63 Trp ugg 1 1 1 1

Leu

cuu 1.39 0.82 0.18 1.47 Tyr uac 1.44 1.22 1.45 0.83

aac 1.5 1.54 1.69 1.1 uau 0.59 0.78 0.55 1.17 Asn

aau 0.54 0.48 0.42 1.05

*Preferred codons are marked in ‘bold’.

Page 7: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

INDIAN J BIOTECHNOL, JANUARY 2013

64

and Gossypium raimondii (Fig. 4). Within the groups,

cyanobacterial species show the most conserved

sequence, except Prochlorococcus marinus where

phenyl alanine residue has changed to leucine.

This change has also been detected in archaea where

in few species (Hyperthermus butylicus, S. marinus,

T. gammatolerans & 3 Methanosarcina species),

phenyl alanine residue has changed either to leucine

or tyrosine. Proteobacteria species also show a

conserved rbcL motif sequence.

Expressional Variability

ENc Plot Analysis

The ENc plot analysis [ENC plotted against %

(G+C)3 (Fig. 5) ] was used to investigate patterns of

synonymous codon usage, which shows that all the

species lie below the expected curve, except two plant

species B. napus and N. benthamiana, which are lying

on the curve of the predicted values. This implies that

the choice of the rbcL gene codon of the different

species is constrained only by a mutational biasness.

Correspondence Analysis

To determine the codon usage of rbcL gene among

the four groups, correspondence analysis on the genes

RSCU values of the 43 organisms was carried out by

a standard procedure19

. The distribution of the rbcL

gene from the 43 species on the first two major

axes of the correspondence analysis is shown in

Fig. 6. Genes are recognized based on their groups.

Proteobacteria and plant species are separated along

the first major axis, while cyanobacteria and archaea

species are separated along the second major axis,

thereby depicting significant difference in their codon

usage pattern. A closer look depicts that archaea

species (N. pharaonis & Methanoculleus marisnigri)

and cyanobacteria species (Cyanobacteria A &

B-Prime, Synechoccus RCC307 & T. elongates) show

similar RSCU pattern with proteobacteria species.

Similarly, P. marinus (NATL1A, MIT9211 & AS9601)

Table 2—Z-scores of the rbcL genes for the four different groups

Z-Scores Amino

acids Archaea

spp.

Cyanobacteria

spp.

Proteobacteria

spp.

Plant

spp.

Alanine 1.43 2.11 4.25 0.72

Glycine 2.13 2.33 7.94 -0.97

Leucine 0.54 -2.43 -3.97 -4.68

Proline -1.5 2.56 -4.49 -0.08

Arginine -5.42 -0.42 6.91 -2.61

Serine 4.11 3.68 -0.86 4.49

Threonine 2.43 4.89 7.12 2.98

Valine 2.3 -0.69 -0.38 -1.33

Amino acids having more than two codons have been considered.

Z values greater than or less than 1.96 (at P<0.05) are significant.

Fig. 4—Alignment of ‘rbcL’ motif sequences to identify the

degree of conservation among the archaea, proteobacteria,

cyanobacteria and plants.

Page 8: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

MONDAL et al: A COMPARATIVE COMPUTATIONAL STUDY OF rbcL GENES

65

species of cyanobacteria are found to have closer

resemblance to all the plant species, except B. napus,

N. benthamiana and G. raimondii, which are similar

to the S. marinus and M. barkeri from archaea species.

Codon Adaptation Index Variation

Codon Adaptation Index (CAI) predicting the

degree of expression of the rbcL gene analyzed shows

an average rate of expression of 0.72-0.76 (Fig. 7).

This indicates that rbcL is highly expressible gene

and is playing an important role in these species life.

Only three species from plant, Helianthus annuus,

Arabidopsis thaliana and Chlamydomonas reinhardtii

are found showing a comparative lower CAI values.

This lower value might be due to usage of other

Rubisco genes.

Conclusion

The comparative study of rbcL genes among different groups of species highlighted few significant outcomes. First of all, the results show that rbcL

genes of cyanobacteria and plants are similar in comparison to the other groups, which is an already known evolutionary relationship. In addition to that, it further shows that rbcL genes of these two groups have lower GC% in comparison to those of proteobacteria and archaeal species, which show higher GC% values. On the other hand, cyanobacteria and the other prokaryotic groups show differences from plants on the basis of their higher usage ratio in GC preference codons, thereby signifying the existence of mutational pressure among their rbcL genes. Such mutational pressure is not observed in case of plants. Though the rbcL gene expression is nearly similar among the four groups but the pattern in their codon usages is found different.

Acknowledgement Authors wish to thank the Department of

Biotechnology, University of Burdwan, Burdwan for

providing the facility of Computational Biology

laboratory.

Fig. 5—The effective number of codons (ENc) plotted against

the GC3s for the rbcL gene. Each expected ENc from GC3s is

shown as a standard curve.

Fig. 6—Correspondence analysis of the rbcL gene from the

four groups based on its RSCU values. The proteobacteria and

the plant species are separated along the first major axis, while

the cyanobacteria and archaea species are separated along the second

major axis, thereby depicting the difference in their codon usage pattern.

Fig. 7—Representation of the CAI values for the ‘rbcL’ gene from four groups. An average of above 70% CAI is shown by all the

organisms, except H. annus, A. thaliana and C. reinhardtii from plants.

Page 9: A comparative computational study of the ‘rbcL’ gene in ... · A comparative computational study of the ... prokaryotes, genes for both rbcL and rbcS are ... Phylogenetic tree

INDIAN J BIOTECHNOL, JANUARY 2013

66

References 1 Chase M W, Soltis D E, Olmstead R G, Morgan D,

Les D H et al, Phylogenetics of seed plants: An analysis of

nucleotide sequences from the plastid gene rbcL, Ann Mo Bot

Gard, 80 (1993) 528-580.

2 Tabita F R, Molecular and cellular regulation of autotrophic

carbon dioxide fixation in microorganisms, Microbiol Res,

52 (1988)155-189.

3 Miziorko H M & Lorimer G H, Ribulose-l,5-bisphosphate

carboxylase-oxygenase, Ann Rev Biochem, 52 (1983) 507-535.

4 Delwiche C F & Palmer J D, Rampant horizontal transfer

and duplication of rubisco genes in eubacteria and plastids,

Mol Biol Evol, 13 (1996) 873-882.

5 Morden C W, Delwiche C F, Kuhsel M & Palmer J D, Gene

phylogenies and the endosymbiotic origin of plastids,

Biosystems, 28 (1992) 75-90.

6 Ueda K & Shibuya H, Molecular phylogeny of rbcL and its

bearing on the origin of plastids: Bacteria or cyanobacteria?,

in Endocytobiology, edited by S Sato, M Ishida & H

Ishikawa, (V Tiibingen University Press, Tiibingen,

Germany) 1993, 369-376.

7 Mcfadden G I, Gilson P R & Waller R E, Molecular

phylogeny of chlorarachniophytes based on plastid rRNA

and rbcL sequences, Arch Protistenkd, 145 (1995) 231-239.

8 Delwiche C F, Kuhsel M & Palmer J D, Phylogenetic analysis of tufA sequences indicates a cyanobacterial origin of all plastids, Mol Phylogenet Evol, 4 (1995) 110-128.

9 Douglas S E & Murphy C A, Structural, transcriptional, and

phylogenetic analysis of the atpB gene cluster from the

plastid of CryptomonusΨ (Cryptophyceae), J Phycol, 30

(1994) 329-340.

10 Clarke A K & Eriksson M J, The cyanobacterium

Synechococcus sp. PCC 7942 possesses a close homologue

to the chloroplast ClpC protein in higher plants, Plant Mol

Biol, 31 (1996) 721-730.

11 Palmer J D, A genetic rainbow of plastids, Nature (Lond),

364 (1993) 762-763.

12 Felsenstein J, PHYLIP: Phylogeny interference package

(version 3.69) (Department of Genome Sciences and

Department of Biology, University of Washington,

Washington, USA) 1989, 164-166.

13 Swofford D L, Olsen G J, Waddell P J & Hillis D M,

Phylogenetic inference, in Molecular systematics, 2nd edn,

edited by D M Hillis, C Moritz & B K Mable (Sinauer

Associates, Inc., Publishers, Sunderland, USA) 1996, 407-514.

14 Kaufman L & Rousseeuw P J, Finding groups in data: An

introduction to cluster analysis, (John Wiley & Sons, Inc.,

New Jersey, USA) 1990.

15 Fu C, Xiong J & Miao W, Genome-wide identification and

characterization of cytochrome P450 monooxygenase genes

in the ciliate Tetrahymena thermophila, BMC Genomics, 10

(2009) 208.

16 Meng Z, Wei L & Xia L, Analysis of synonymous codon

usage in chloroplast genome of Populus alba, J For Res, 19

(2008) 293-297.

17 Sharp P M & Li W H, The codon adaptation index—A

measure of directional synonymous codon usage bias, and

its potential applications, Nucleic Acids Res, 15 (1987)

1281-1295.

18 Kusumi J & Tachida H, Compositional properties of green-

plant plastid genomes, J Mol Evol, 60(2005) 417-425.

19 Sharp P M, Tuohy T M F & Mosurski K R, Codon usage

in yeast: Cluster analysis clearly differentiates highly

and lowly expressed genes, Nucleic Acids Res, 14(1986)

5125-5143.