comparative genomic analysis of c4 photosynthetic pathway … · 2017-04-11 · available. this...

18
Genome Biology 2009, 10:R68 Open Access 2009 Wang et al. Volume 10, Issue 6, Article R68 Research Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses Xiyin Wang *† , Udo Gowik , Haibao Tang , John E Bowers * , Peter Westhoff and Andrew H Paterson Addresses: * Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA. College of Sciences, Hebei Polytechnic University, Tangshan, Hebei 063000, China. Institut fur Entwicklungs- und Molekularbiologie der Pflanzen, Heinrich-Heine-Universitat 1, Universitatsstrasse, D-40225 Dusseldorf, Germany. § Department of Plant Biology, University of Georgia, Athens, GA 30602, USA. Correspondence: Andrew H Paterson. Email: [email protected] © 2009 Wang et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. C4 photosynthetic pathway evolution <p>Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway</p> Abstract Background: Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor. Results: We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function. Conclusions: Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis. Published: 23 June 2009 Genome Biology 2009, 10:R68 (doi:10.1186/gb-2009-10-6-r68) Received: 18 March 2009 Revised: 27 May 2009 Accepted: 23 June 2009 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2009/10/6/R68

Upload: others

Post on 11-Apr-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

Open Access2009Wanget al.Volume 10, Issue 6, Article R68ResearchComparative genomic analysis of C4 photosynthetic pathway evolution in grassesXiyin Wang*†, Udo Gowik‡, Haibao Tang*§, John E Bowers*, Peter Westhoff‡ and Andrew H Paterson*§

Addresses: *Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA. †College of Sciences, Hebei Polytechnic University, Tangshan, Hebei 063000, China. ‡Institut fur Entwicklungs- und Molekularbiologie der Pflanzen, Heinrich-Heine-Universitat 1, Universitatsstrasse, D-40225 Dusseldorf, Germany. §Department of Plant Biology, University of Georgia, Athens, GA 30602, USA.

Correspondence: Andrew H Paterson. Email: [email protected]

© 2009 Wang et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.C4 photosynthetic pathway evolution<p>Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway</p>

Abstract

Background: Sorghum is the first C4 plant and the second grass with a full genome sequenceavailable. This makes it possible to perform a whole-genome-level exploration of C4 pathwayevolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3),and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisitefor the evolution of C4 photosynthesis from a C3 progenitor.

Results: We show that both whole-genome and individual gene duplication have contributed tothe evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, withsome C4 genes being recruited from whole genome duplication duplicates by multiple modes offunctional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode ofnew gene formation, with recursive tandem duplication and gene fusion accompanied by adaptiveevolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghumand maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly,a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly andshows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4metabolism. We also found evidence that both gene redundancy and alternative splicing may havesheltered the evolution of new function.

Conclusions: Gene duplication followed by functional innovation is common to evolution of mostbut not all C4 genes. The apparently long time-lag between the availability of duplicates forrecruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of originsof C4 genes, suggests that there may have been a long transition process before the establishmentof C4 photosynthesis.

Published: 23 June 2009

Genome Biology 2009, 10:R68 (doi:10.1186/gb-2009-10-6-r68)

Received: 18 March 2009Revised: 27 May 2009Accepted: 23 June 2009

The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2009/10/6/R68

Genome Biology 2009, 10:R68

Page 2: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.2

BackgroundMany of the most productive crops in agriculture use the C4photosynthetic pathway. Despite their multiple origins, theyare all characterized by high rates of photosynthesis and effi-cient use of water and nitrogen. As a morphological and bio-chemical innovation [1], the C4 photosynthetic pathway isproposed to have been an adaptation to hot, dry environ-ments or CO2 deficiency [2-5]. The C4 pathway independentlyappeared at least 50 times during angiosperm evolution [6,7].Multiple origins of the C4 pathway within some angiospermfamilies [8,9] imply that its evolution may not be complex,perhaps suggesting that there may have been genetic pre-deposition in some C3 plants to C4 evolution [6].

The high photosynthetic capacity of C4 plants is due to their

unique mode of CO2 assimilation, featuring strict compart-

mentation of photosynthetic enzymes into two distinct cell

types, mesophyll and bundle-sheath (illustrated in Figure 1

for the NADP-malic enzyme (NADP-ME) type of C4 path-

way). First, CO2 assimilation is carried out in mesophyll cells.

The primary carboxylating enzyme, phosphoenolpyruvate

carboxylase (PEPC), together with carbonic anhydrase (CA),

which is crucial to facilitating rapid equilibrium between CO2

and , is responsible for the hydration and fixation of

CO2 to produce a C4 acid, oxaloacetate. In NADP-ME-type C4

species, oxaloacetate is then converted to another C4 acid,

malate, catalyzed by malate dehydrogenase (MDH). Malate

then diffuses into chloroplasts in the proximal bundle-sheath

cells, where CO2 is released to yield pyruvate by the decarbox-

ylating NADP-ME. The released CO2 concentrates around the

secondary carboxylase, Rubisco, and is reassimilated by it

through the Calvin cycle. Pyruvate is transferred back into

mesophyll cells and catalyzed by pyruvate orthophosphate

dikinase (PPDK) to regenerate the primary CO2 acceptor,

phosphoenolpyruvate. Phosphorylation of a conserved serine

residue close to the amino-terminal end of the PEPC polypep-

tide is essential to its activity by reducing sensitivity to the

feedback inhibitor malate and a catalyst named PEPC kinase

(PPCK). C4 photosynthesis results in more efficient carbon

assimilation at high temperatures because its combination of

morphological and biochemical features reduce photorespi-

ration, a loss of CO2 that occurs during C3 photosynthesis at

high temperatures [10]. PPDK regulatory protein (PPDK-

RP), a bifunctional serine/threonine kinase-phosphatase,

catalyzes both the ADP-dependent inactivation and the Pi-

dependent activation of PPDK [11].

The evolution of a novel biochemical pathway is based on thecreation of new genes, or functional changes in existing genes.Gene duplication has been recognized as one of the principalmechanisms of the evolution of new genes. Genes encodingenzymes of the C4 cycle often belong to gene families having

HCO3−

The NADP-ME type of C4 pathway in sorghum and maizeFigure 1The NADP-ME type of C4 pathway in sorghum and maize. CA, carboxylating anhydrase; MDH, malate dehydrogenase; ME, malic enzyme; OAA, oxaloacetate; PEPC, phosphoenolpyruvate carboxylase; PPCK, PEPC kinase; PPDK, pyruvate orthophosphate dikinase; PPDK-RP, PPDK regulatory protein; TP, transit peptide.

CO2

CA

Mesophyll cell Bundle sheath cell

CO2

HCO3

PEPC

PPCK

OAA

(C4)

MDH

Malate

(C4)

ME

CO2

Pyruvate

(C4)PPDK

PEP

(C3)Calvin

cycle

TP

chloroplastchloroplast CytosolCytosol

RP

Genome Biology 2009, 10:R68

Page 3: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.3

multiple copies. For example, in maize and sorghum, a singleC4 PEPC gene and other non-C4 isoforms were discovered[12], whereas in Flaveria trinervia, a C4 eudicot, multiplecopies of C4 PEPC genes were found [13]. These findings ledto the proposition that gene duplication, followed by func-tional innovation, was the genetic foundation for photosyn-thetic pathway transformation [14].

All plant genomes, including grass genomes, have beenenriched with duplicated genes derived from tandem duplica-tions, single-gene duplications, and large-scale or whole-genome duplications [15-18]. A whole-genome duplication(WGD) occurred in a grass ancestor approximately 70 millionyears ago (mya), before the divergence of the panicoid,oryzoid, pooid, and other major cereal lineages [19,20]. A pre-liminary analysis of sorghum genome data suggested thatduplicated genes from various sources have expanded thesizes of some families of C4 genes and their non-C4 isoforms[21]. However, different duplicated gene pairs often havedivergent fates [22]. While most duplicated genes are lost,gene retention in some functional groups produces large genefamilies in plants [15,19,20]. Together with other lines of evi-dence, these have led to the interesting proposition of differ-ential gene duplicability [23,24], or duplication-resistance[25], due to possible gene dosage imbalance, which can bedeleterious [26]. Even when duplicated genes survive, there israrely strong evidence supporting possible functional innova-tion [27].

Most C4 plants are grasses, and it has been inferred that C4photosynthesis first arose in grasses during the Oligoceneepoch (24 to 35 mya) [28,29]. Sorghum and maize, thought tohave diverged from a common ancestor approximately 12 to15 mya [21], are both in the Andropogoneae tribe, which isentirely composed of C4 plants [8]. Sorghum, a NADP-ME-type C4 plant grown for food, feed, fiber and fuel, is the sec-ond grass and the first C4 plant with its full genome sequenceavailable [21]. The first grass genome sequenced was rice, aC3 plant. The availability of two grass genome sequencesusing different types of photosynthesis provides a valuableopportunity to explore C4 pathway evolution. In the presentresearch, by using a comparative genomic approach and phy-logenetic analysis, we compared C4 genes and their non-C4isoforms in sorghum, maize and rice. The aims of this studyare to investigate: the role of gene duplication in the evolutionof C4 enzyme genes; the role of adaptive evolution in C4 path-way formation; the long-standing hypothesis that a reservoirof duplicated genes has been a prerequisite of C4 pathwayevolution [14]; and whether codon usage bias has contributedto C4 gene evolution, as previously suggested [30]. Ourresults will help to clarify the evolution of the C4 pathway andmay benefit efforts to transform C3 plants, such as rice, to C4photosynthesis [31].

ResultsPEPC enzyme genesGrass PEPC enzyme genes form a small gene family. Thereare five plant-type and one bacteria-type PEPC(Sb03g008410 and Os01g0110700) [32] gene isoforms insorghum and rice, respectively, excepting two likely pseudog-enized rice isoforms (Os01g0208800, Os09g0315700) hav-ing only 217 and 70 codons. There is one sorghum C4 PEPC[33,34], Sb10g021330 (Table S1 in Additional data file 1). Pre-vious characterization indicated that its transcripts are morethan 20 times more abundant in mesophyll than in bundle-sheath cells [35] (Table S2 in Additional data file 1).

By analysis of gene colinearity, we investigated how genomeduplication has affected the PEPC gene families in rice andsorghum. The PEPC gene in rice that is most similar to thesorghum C4 PEPC is Os01g0208700, sharing 73% amino acididentity. This similarity raised the possibility that the twogenes are orthologous. Although the two genes under consid-eration are not in colinear locations, single-gene transloca-tion is not rare in grasses [36]. The outparalogs, homologsproduced by WGD in the common ancestor of sorghum andrice, of the sorghum C4 PEPC gene are located at the expectedhomoeologous locations in both sorghum and rice(Sb04g008720 and Os02g0244700). The rice geneOs01g0208700 and the C4 genes are grouped together, andoutparalogs (Os02g0244700 and Sb04g008720) of the sor-ghum C4 gene form a sister group on the phylogenetic tree.The pattern can be explained if Os01g0208700 were ortholo-gous to the sorghum C4 PEPC gene, implied by their highsequence similarity and shared high GC content (detailedbelow). In our view, the most parsimonious explanation ofthese data is that the oryzoid (rice) ortholog was translocatedafter the sorghum-rice (panicoid-oryzoid) divergence, thenthe panicoid (sorghum) ortholog was recruited into the C4pathway. We cannot falsify a model invoking independentloss of alternative homeologs in sorghum (panicoids) and rice(oryzoids), respectively, although this model seems improba-ble in that such loss of alternative homoeologs has onlyoccurred for approximately 1.8 to 3% of genome-wide geneduplicates in these taxa [21]. The other rice and sorghumPEPC genes form four orthologous pairs. Whether the genesfrom different orthologous groups are outparalogs could notbe supported by colinearity inference associated with thepan-cereal genome duplication.

Grass PEPC genes show high GC content, like many othergrass genes, apparently as a result of changes after the mono-cot-dicot split but before the radiation of the grasses [37]. Theevolution of C4 PEPC genes in sorghum and maize was previ-ously proposed to have been accompanied by GC elevation,resulting in codon usage bias [38]. We found that C4 PEPCgenes do have higher GC content than other sorghum andmaize PEPC genes, especially at the third codon sites (GC3).The sorghum and maize C4 PEPC genes have a GC3 contentof approximately 84%, significantly higher than other genes

Genome Biology 2009, 10:R68

Page 4: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.4

in both species (Table S3 in Additional data file 1). The sus-pected rice ortholog Os01g0208700 has even higher GC3content, approximately 92%. In contrast, the GC3 content ofall Arabidopsis PEPC genes is <43%. This shows that thehigher GC content in the C4 PEPC genes may not be related tothe evolution of C4 function, as discussed below.

C4 PEPC genes show evidence of adaptive evolution. To char-acterize the evolution of C4 PEPC genes, we aligned thesequences and constructed gene trees without involving the

possible pseudogenized rice gene (Additional data file 2). Wefound the genes to be in two groups, with one containingplant-type and the other bacteria-type PEPC genes. Carefulinspection suggested problems with the tree, for orthologousgenes were not grouped together as expected. After removingthe bacteria-type genes and rooting the subtree containingthe C4 genes with Arabidopsis PEPC genes, we obtained atree in which orthologs are grouped together as expected (Fig-ure 2a). The sorghum and maize C4 genes are on a remarka-bly long branch, suggesting that they are rapidly evolving

Phylogeny of C4 enzyme genes and their isoforms insorghum, rice, maize and ArabidopsisFigure 2Phylogeny of C4 enzyme genes and their isoforms in sorghum, rice, maize and Arabidopsis. Thick branches show C4 enzyme genes. Bootstrap percentage values are shown as integers; Ka/Ks ratios are shown as numbers with fractions, or underlined when >1. In the gene IDs, Sb indicates Sorghum bicolor, Os indicates Oryza sativa, Zm indicates Zea mays, and At indicates Arabidopsis thaliana. (a) PEPC; (b) PPCK; (c) NADP-MDH; (d) NADP-ME; (e) PPDK; (f) PPDK-RP; (g) CA.

Os01g0723400

Sb03g033250

Os05g0186300

Sb09g005810

Os01g0188400

Zm NM 001111913

Sb03g003220

Zm NM 001111843

Sb03g003230

Os01g0743500

Sb03g034280

Zm NM 001111822

At5g11670

At5g25880

At2g19900

At1g79750

100

100

85

75

100

100

100

58

100

100

100

60

85

0.05

Sb03g029190

Sb03g029170 FU1

Zm U08401 FU1

Zm U08403 FU1

Sb03g029180

Sb03g029170 FU2

Zm U08403 FU2

Zm U08401 FU2

Zm U08403 FU3

Os01g0639900

At NM 111016

100

36

100

60

39

49

84

73

0.1

Sb09g019930

Zm NM 001112268

Os05g0405000

Os03g0432100

Sb01g031660

At NM 001084926

99

27

100

0.02

Sb03g035090

Os01g0758300

Os AK242583

Zm NM 001111968

Sb02g021090

Os08g0366000

Sb07g014960

Os02g0244700

Zm NM 001112033

Sb04g008720

Os01g0208700

Sb10g021330

Zm NM 001111948

At NM 001036

At NM 180041

100

100

93

70

96

83

98

99

97

84

92

57

0.02

Sb07g023910

Sb07g023920

Zm X16084

Os08g0562100

At NM 180883

89

100

0.05

Zm NM 001112303

Zm NM 001112302

Sb04g026490

Os02g0625300

Os04g0517500

Sb06g022690

Zm NM 001112304

Os02g0807000

Sb04g036570

Zm NM 001112338

At NM 111324

At NM 100738

76

100

93

76

99

100

100

100

100

0.1

(a)

(c)

(e)

(g)

(d)

(b)

0. 31

0. 30

0. 31

0. 71

0. 51

0. 20

0. 90

0. 21

0. 70

0. 11

0. 61

0. 61

0. 21

1 . 0

0. 90

0. 22

0. 40

0. 51

0. 23

0. 50

0. 01

0. 31

0. 81

0. 32

0. 60

0. 70

0. 60

0. 80

0. 30

0. 50

0. 500. 30

0. 30

0. 30

0. 40

0. 21

0. 61

0. 14

0. 10

1 . 0

0. 60

0. 63

0. 91

0. 11

0. 90

0. 70

0. 90

0. 70

0. 33

0. 12

0. 62

0. 27

0. 50

0. 21

0. 40

0. 22

2. 40

1. 22

0. 51

0. 52

9990. 61

0. 61

0. 70

4 . 0

0. 810. 52

0. 60

0. 61

0. 54

687

0. 41

999

08 .3

0. 60

Sb02g035200

Sb02g035210

Sb02g035190

Zm NM 001112403

Os07g0530600

At3g01200

At4g21210

86

81

92

100

0.1

(f) 0. 92

0. 86

0. 04

0. 12

0. 70

999

851

0. 02

0. 17

Genome Biology 2009, 10:R68

Page 5: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.5

compared to the other genes, and implying possible adaptiveselection during the evolution of the C4 pathway, consistentwith a previous proposal [39].

Maximum likelihood analysis supports possible adaptive evo-lution of C4 PEPC genes. First, characterization of nonsynon-ymous nucleotide substitution rates (Ka) supports rapidevolution of the C4 genes and their rice ortholog. Under afree-parameter model, Ka values are >0.048 on branchesleading to C4 genes and their rice ortholog after the rice-sor-ghum split, as compared to ≤0.02 on branches leading to thenon-C4 isoforms. Second, the C4 genes may have been posi-tively selected. The Ka/Ks ratio is nearly tenfold higher (0.71)on the branch leading to the last common ancestor of the sor-ghum and maize C4 genes than on other branches after therice-sorghum split (≤0.08). Though the ratio is <1, we pro-pose that the striking difference in Ka/Ks between C4 andnon-C4 genes may be evidence of positive selection in the C4genes for the following reasons: the criterion Ka/Ks > 1 hasbeen proposed to be unduly stringent to infer positive selec-tion [40]; the maximum likelihood analysis is conservative, asreported previously [27]; and the similar slow evolutionarychanges in all non-C4 genes in sorghum, maize and rice (Fig-ure 1a) imply elevated rates in the C4 genes, rather than puri-fying selection in the non-C4 genes.

C4 PEPC genes show elevated and aggregated amino acidsubstitutions especially in function-specific regions, provid-ing further evidence of adaptive evolution. Comparison totheir outparalogs and their nearest outgroup sequence sug-gests that C4 PEPC genes have accumulated approximately100 putative substitutions over their full length (Table 1), farmore than non-C4 PEPC genes. The substitutions are referredto as putative since we cannot rule out the possibility of par-allel and reverse mutations. However, the extremely signifi-cant difference strongly supports divergent evolution of C4and non-C4 PEPC genes. The amino acid substitutions arenot uniformly distributed along the lengths of the C4 genes(Table S4 in Additional data file 1), but concentrated in thecarboxy-terminal half, including the critical mutation S780(the serine at position 780 of the maize C4 PEPC protein thatis essential to relieving feedback inhibition by malate [41]).This is consistent with previous findings [42].

Surprisingly, Os01g0208700 has also accumulated signifi-cantly more mutations than expected, and has a relativelylarger selection pressure than other non-C4 PEPC genes,implying that it may also be under adaptive selection (Table 1;Table S4 in Additional data file 1), as further discussed below.

PPCK enzyme genesPPCK gene families have been enriched by duplication events,including the pan-cereal WGD and tandem duplication. Weidentified three PPCK gene isoforms in both sorghum andrice, respectively (Table S1 in Additional data file 1), which arein one-to-one correspondence in expected colinear locations

between the two species (Figure 2b). These rice and sorghumisoforms correspond to four maize isoforms (ZmPPCK1 toZmPPCK4; Figure 2b), with ZmPPCK2 and ZmPPCK3 likelyproduced in maize after its divergence from a lineage sharedwith sorghum. The sorghum C4 PPCK is encoded bySb04g036570, and its maize ortholog is ZmPPCK1. Their C4nature is supported by evidence that their expression is light-induced and their transcripts are more abundant in meso-phyll than bundle-sheath cells [30]. In contrast, the expres-sion of sorghum and maize non-C4 isoforms is not light- butcycloheximide-affected [30]. The outparalogs of the sorghumC4 gene and its rice ortholog were likely lost before the twospecies split, whereas the other four isoforms are outparalogs.

Maximum likelihood analysis and inference of aggregatedamino acid substitutions found no evidence of adaptive selec-tion during C4 PPCK gene evolution (Table S4 in Additionaldata file 1).

Consistent with a previous report [30], all studied grass PPCKgenes have extremely high GC content, with a GC3 contentfrom 88 to 97% (Table S3 in Additional data file 1). The grassC4 and non-C4 PPCK genes have similar GC content.

NADP-MDH enzyme genesThere are two NADP-MDH enzyme genes in sorghum (TableS1 in Additional data file 1), the non-C4 gene Sb07g023910and the C4 gene Sb07g023920, tandemly located as previ-ously reported [43]. They have only one homolog in both riceand maize [44], with the rice homolog (Os08g0562100) at theexpected colinear location. This suggests that the NADP-MDH WGD outparalog was lost before the sorghum-rice split.Each of the sorghum tandem genes has an ortholog in Vetiv-eria and Saccharum, respectively [44], suggesting that thetandem duplication occurred before the divergence of sor-ghum and Vetiveria, but after the sorghum-maize split, aninference further supported by gene tree analysis in that theyare more similar to one another than to the single maizehomolog (Figure 2c).

The C4 NADP-MDH gene shows an interesting mode of adap-tive evolution. Though the C4 NADP-MDH genes have accu-mulated more mutations than non-C4 genes (Table S4 inAdditional data file 1), neither maximum likelihood analysisnor the inference of aggregated amino acid substitution sug-gest adaptive selection. However, the sorghum C3 and C4genes were likely to have been produced by an ancestral C4gene through duplication. One of the duplicates may have lostits C4 function as it is not light-induced and only constitu-tively expressed [43].

The NADP-MDH genes are chloroplastic. A chloroplast tran-sit peptide (cTP) having approximately 40 amino acids isidentified in all the genes from grasses and Arabidopsis(Additional data file 3). This indicates that the cTP waspresent in the common ancestor of angiosperms. Non-chloro-

Genome Biology 2009, 10:R68

Page 6: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.6

Genome Biology 2009, 10:R68

Table 1

Aggregated amino acid substitution analysis results

Gene 1 Gene 2 Outgroup Alignment length

Alignment length without gaps

Average identity

Overall substitution number in gene 1

Overall substitution number in gene 2

P-value

PEPC

Sb10g021330 Os02g0244700 Os01g0758300 972 958 0.76 110 26 5.89E-13

Zm_NM_00111968 Os02g0244700 Os01g0758300 971 968 0.78 92 33 1.31E-07

Sb10g021330 Os02g0244700 Sb03g035090 972 958 0.76 117 28 1.46E-13

Zm_NM_00111968 Os02g0244700 Sb03g035090 971 968 0.77 104 34 2.54E-09

PPCK

Sb04g036570 Os02g0807000 Sb06g022690 309 284 0.65 15 14 8.53E-01

Zm_NM_001112338 Os02g0807000 Sb06g022690 309 281 0.63 18 11 1.94E-01

CA

U08403_FU3 Os01g0639900 Sb03g029190.1 272 201 0.75 19 18 8.69E-01

U08403_FU2 Os01g0639900 Sb03g029190.1 273 200 0.73 20 18 7.46E-01

U08403_FU1 Os01g0639900 Sb03g029190.1 273 202 0.79 13 18 3.69E-01

U08401_FU2 Os01g0639900 Sb03g029190.1 272 201 0.75 18 18 1.00E+00

U08401_FU1 Os01g0639900 Sb03g029190.1 273 202 0.78 14 18 4.80E-01

Sb03g029170_FU2 Os01g0639900 Sb03g029190.1 272 201 0.78 14 16 7.15E-01

Sb03g029170_FU1 Os01g0639900 Sb03g029190.1 273 201 0.80 11 20 1.06E-01

Sb03g029180 Os01g0639900 Sb03g029190.1 274 202 0.80 11 19 1.44E-01

U08403_FU3 Os01g0639900 At_NM_111016 293 201 0.50 14 13 8.47E-01

U08403_FU2 Os01g0639900 At_NM_111016 293 200 0.49 16 14 7.15E-01

U08403_FU1 Os01g0639900 At_NM_111016 293 202 0.50 10 15 3.17E-01

U08401_FU2 Os01g0639900 At_NM_111016 293 201 0.50 12 13 8.41E-01

U08401_FU1 Os01g0639900 At_NM_111016 293 202 0.50 11 15 4.33E-01

Sb03g029170_FU2 Os01g0639900 At_NM_111016 293 201 0.50 10 10 1.00E+00

Sb03g029170_FU1 Os01g0639900 At_NM_111016 293 201 0.50 9 14 2.97E-01

Sb03g029180 Os01g0639900 At_NM_111016 293 202 0.50 8 11 4.91E-01

PPDK

Sb09g019930 Os05g0405000 Os03g0432100 949 946 0.83 42 28 9.43E-02

Zm_NM_001112268 Os05g0405000 Os03g0432100 950 944 0.83 44 28 5.93E-02

Sb09g019930 Os05g0405000 Sb01g031660 958 946 0.76 37 15 2.28E-03

Zm_NM_001112268 Os05g0405000 Sb01g031660 961 942 0.78 32 18 4.77E-02

NADP-MDH

Sb07g023920 Os08g0562100 At_NM_180883 443 427 0.77 22 19 6.39E-01

Sb07g023910 Os08g0562100 At_NM_180883 443 432 0.75 25 16 1.60E-01

ZM_X16084 Os08g0562100 At_NM_180883 443 430 0.75 25 13 5.16E-02

NADP-ME

Sb03g003230 Os01g0188400 Os05g0186300 642 633 0.80 46 16 1.39E-04

Sb03g003230 Os01g0188400 Sb09g005810 642 633 0.80 41 20 7.17E-03

Sb03g003220 Os01g0188400 Os05g0186300 650 635 0.84 23 15 1.94E-01

ZM_NM_001111843 Os01g0188400 Os05g0186300 641 634 0.80 47 16 9.40E-05

ZM_NM_001111913 Os01g0188400 Os05g0186300 668 633 0.84 26 15 8.58E-02

PPDK-RP

Sb02g035190 Os07g0530600 At4g21210 474 426 0.58 37 17 6.00E-03

Zm_NM_001112403 Os07g0530600 At4g21210 474 423 0.57 33 23 1.80E-01

Sb02g035190 Sb02g035200 Os07g0530600 476 408 0.69 19 22 6.40E-01

Sb02g035190 Sb02g035210 Os07g0530600 483 384 0.69 21 22 8.70E-01

Zm_NM_001112403 Sb02g035200 Os07g0530600 472 416 0.67 25 22 6.60E-01

Zm_NM_001112403 Sb02g035210 Os07g0530600 482 389 0.68 25 25 1.00E+00

Page 7: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.7

plastic NADP-MDH genes identified in the sorghum genomeshare less than 40% protein sequence similarity with thechloroplastic ones.

All of the grass NADP-MDH enzyme genes studied have ele-vated GC content compared to the Arabidopsis ortholog,especially regarding GC3 (50% versus 40%; Table S3 in Addi-tional data file 1). The grass C4 genes have slightly higher GCcontent than the non-C4 genes.

NADP-ME enzyme genesThe NADP-ME gene family has been gradually expanding dueto tandem duplication and the pan-cereal WGD. We identi-fied five and four NADP-ME enzyme genes in sorghum andrice, respectively (Table S1 in Additional data file 1). The sor-ghum C4 gene is Sb03g003230, whose transcript is abundantin bundle-sheath but not mesophyll cells [35] (Table S2 inAdditional data file 1). The C4 gene has a tandem duplicatethat may have been produced before the sorghum-maize splitbased on gene similarity and tree topology (Figure 2d). Thetandem genes share the same rice ortholog (Os01g0188400)at the expected colinear location, and their WGD duplicatescan be found at the expected colinear location in both species.The other sorghum and rice NADP-ME genes form twoorthologous pairs, having also remained at the colinear loca-tions predicted based on the pan-cereal duplication.

Maximum likelihood analysis indicates that the sorghum andmaize C4 NADP-ME genes are under positive selection. Thebranches leading to their two closest ancestral nodes have aKa/Ks ratio > 1 (P-value = 8 × 10-10). Moreover, the C4 geneshave a significant abundance of amino acid substitutions(Table 1; Table S4 in Additional data file 1). The most affectedregions in sorghum and maize overlap with one another, fromresidue 141 to residue 230 in sorghum, and from residue 69 toresidue 181 in maize.

The grass NADP-ME genes have higher GC content than theirArabidopsis homologs (Table S3 in Additional data file 1).The highest GC content (GC3 > 82%) is found not in the C4genes but in their outparalogs, Sb09g005810 andOs05g0186300.

The C4 genes, their tandem paralogs in sorghum and maize,and their rice ortholog all share an approximately 39 aminoacid cTP that is absent from their WGD paralogs in grasses, orhomologs in Arabidopsis. This seems to suggest that the cTPwas acquired by one member of a duplicated gene pair afterthe pan-grass WGD but before the sorghum-rice divergence.

PPDK enzyme genesSorghum and rice both have two PPDK enzyme genes (TableS1 in Additional data file 1). The sorghum C4 PPDK gene(Sb09g019930) is identified based on its approximately 90%amino acid identity with the maize C4 gene. Its transcript isabundant in mesophyll rather than bundle-sheath cells [35]

(Table S2 in Additional data file 1). Its rice ortholog(Os05g0405000) can be inferred based on both gene trees(Figure 2e) and gene colinearity. The other rice and sorghumisoforms are orthologous to one another. Whether the fourisoforms are outparalogs produced by the WGD could not bedetermined by gene colinearity inference due to possible genetranslocations. However, synonymous nucleotide substitu-tion rates and gene tree topologies support that the rice andsorghum paralogs were produced before the two speciesdiverged, and approximately at the time of the pan-cerealWGD.

There are two PPDK genes in maize [10]. One of themencodes both a C4 transcript and a cytosolic transcript, con-trolled by distinct upstream regulatory elements [45]. The C4copy has an extra exon encoding a cTP at a site upstream ofthe cytosolic gene [46]. We found that the sorghum C4 PPDKgene is highly similar to its maize counterpart along theirrespective full lengths, indicating their origin in a commonmaize-sorghum ancestor. The other maize PPDK gene hasonly a partial DNA sequence and, therefore, has been avoidedin the present evolutionary analysis. A similarity searchagainst the maize bacterial artificial chromosome (BAC)sequences indicates that it is on a different chromosome(chromosome 8) from the C4 gene (chromosome 6). Themaize counterpart of the other sorghum PPDK isoform hasnot yet been identified in sequenced BACs.

The C4 PPDK genes may have experienced adaptive evolu-tion. While maximum likelihood analysis did not find evi-dence of adaptive evolution of C4 PPDK genes (Figure 2e), theC4 genes have accumulated significantly or nearly signifi-cantly more amino acid substitutions than their riceorthologs, particularly in the region from approximately resi-due 207 to approximately residue 620 (Table 1; Table S4 inAdditional data file 1).

All grass PPDK genes have higher GC content than their Ara-bidopsis homologs (Table S3 in Additional data file 1), withthe C4 genes themselves being highest in GC content (GC3content approximately 61 to 70%).

All of the characterized PPDK isoform sequences fromgrasses and Arabidopsis share an approximately 20 aminoacid cTP (Additional data file 3), suggesting its origin beforethe monocot-dicot split.

PPDK-RP enzyme genesTandem duplication contributed to the expansion of PPDK-RP genes. Using the maize PPDK-RP gene sequence as aquery, we determined its possible sorghum ortholog,Sb02g035190, which has two tandem paralogs. Their riceortholog, Os07g0530600, was identified in the anticipatedcolinear region. However, we failed to find their WGD outpar-alogs in both sorghum and rice, suggesting possible gene lossin their common ancestor.

Genome Biology 2009, 10:R68

Page 8: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.8

Genome Biology 2009, 10:R68

Dotplots between sorghum and maize CA enzyme protein sequencesFigure 3Dotplots between sorghum and maize CA enzyme protein sequences. (a) Self-comparison of protein sequence of Sb03g029170. (b) Sb03g029170 (horizontal) and Sb03g029180 (vertical); (c) Sb03g029190 (horizontal) and Sb03g029180 (vertical); (d) maize U08403 (horizontal) and Sb03g029180 (vertical); (e) maize U08401 (horizontal) and Sb03g029180 (vertical).

259, 196

0 100 200 300 400 0

50

100

150

200

58, 11

0 100 200 300 400 500 600 0

50

100

150

200

102, 103

0 100 200 0

50

100

150

200

257, 58

0 100 200 300 400 0

50

100

150

200

250

300

350

400

450

259, 196

0 100 200 300 400 0

50

100

150

200

(a)

(d)

(b)

(c)

(e)

Page 9: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.9

Gene trees indicate that the tandem duplication events mayhave occurred before the sorghum-maize divergence, butafter the sorghum-rice divergence (Figure 2f). Maximum like-lihood analysis suggests that both lineages leading to themaize PPDK-RP gene and its sorghum ortholog, and otherisoforms, have been under significant positive selection (Ka/Ks >> 1, P-value = 2.5 × 10-8), implying possible functionalchanges in both lineages. Compared to their rice ortholog,sorghum and maize PPDK-RP genes have accumulated sig-nificantly more amino acid substitutions (Table 1; Table S4 inAdditional data file 1), providing supporting evidence forfunctional innovation.

Both the C4 and non-C4 PPDK-RP genes in sorghum havesimilar GC content (GC3 content approximately 57 to 60%),while the maize PPDK-RP gene has higher GC content (GC3content approximately 67%), especially in the third codonsites (Table S3 in Additional data file 1). All these grass PPDK-RP genes show higher GC content than their Arabidopsishomologs.

CA enzyme genesTandem duplication has profoundly affected the evolution ofCA genes. There are two types of CA enzymes, the alpha andbeta types in sorghum [21], and C4 CA genes are the beta type[47]. Here, we focus on beta-type CA genes. Our analysis indi-cates that there are four beta-type CA enzyme gene isoformsin sorghum, forming a tandem gene cluster with the sametranscriptional orientation, on chromosome 3 (Figure 3a;Table S1 in Additional data file 1). Among them are two pos-sible C4 genes (Sb03g029170 and Sb03g029180), which wereshown by previous analysis of transcript abundance to behighly expressed in mesophyll but not bundle-sheath cells(Table S2 in Additional data file 1). The other two genesinclude one non-C4 gene (Sb03g029190) and one probablepseudogene (Sb03g029200) with only truncated codingsequence, a large DNA insertion in its second exon, and accu-mulated point mutations. These tandem genes have a com-mon rice ortholog (Os01g0639900) at the expected colinearlocation, indicating that gene family expansion has occurredin sorghum (and maize; see below) since divergence fromrice. The WGD outparalogs were not identified in either

Tandem duplication and fusion of CA genes in sorghumFigure 4Tandem duplication and fusion of CA genes in sorghum. Postulated evolution of sorghum CA genes through four tandem duplication events and a gene fusion event is displayed. We show distribution and structures of CA genes, and their peptide-encoding exons, on sorghum chromosome 3. Genes are shown as the large arrows with differently colored outlines and exons are shown as colored blocks contained in the arrows. Homologous exons are in the same color. A chloroplast transit peptide is in dark red. A tandem duplication event is shown by two small black arrows pointing in divergent directions, and a gene fusion event is shown by two small black arrows pointing in convergent directions. A new gene produced by tandem duplication is shown with an arrow in a new color not used by the ancestral genes. A gene produced by fusion of two neighboring genes is shown as a bipartite structure, each part with the color of one of the fused genes. A stop codon mutation is shown by a lightning-bolt symbol, and an exon-splitting event by a narrow triangle.

Sb03g029170 Sb03g029180 Sb03g029190 Sb03g029200

Ancestral gene

Genome Biology 2009, 10:R68

Page 10: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.10

genome, implying possible gene loss after the WGD andbefore the rice-sorghum split.

The two sorghum C4 CA genes differ in cDNA length [35]. Wefound that the larger C4 CA gene may have evolved by fusingtwo neighboring CA genes produced by tandem duplication.In spite of possible alternative splicing programs,Sb03g029170 has a gene length of approximately 10.4 kbpand includes 13 exons, as compared to 4.5 kbp in length and6 exons for Sb03g029180. Pairwise dotplots betweenSb03g029170 and Sb03g029180 show the former has aninternal repeat structure absent from the latter (Figure 3ab;Additional data file 4). The duplication involves the last six ofseven exons and intervening introns 1 to 6 of the ancestralgene (Figure 4a). Comparatively, the other sorghum geneshave only exons 2 to 7, assumed to be a functional unit, bothlacking the first exon in Sb03g029170, which encodes a cTP.This implies that several duplication events have recursivelyproduced extra copies of the functional unit. Some functionalunits act as independent genes, while the other fused with thecomplete one to form an expanded gene including two func-tional units. We found that this fusion involved mutation ofthe stop codon in the leading gene. Each functional unit startswith an ATG codon, which we infer may increase the possibil-ity of alternative splicing. This inference is supported by thefinding that Sb03g029170 may have two distinct transcripts,identified by cDNA HHU69 and HHU22, respectively (TableS2 in Additional data file 1). The two transcripts have distinctlengths, 2,100 and 1,200 bp, respectively, with the expressionof the longer one being light-inducible and C4-related but theshorter one not [35]. The non-C4 gene, Sb03g029190, has anormal structure (Figure 3c) and the pseudogene,Sb03g029200, has a truncated structure.

The tandem duplication and gene fusion are shared by sor-ghum and maize, and maize furthermore has additionalduplication. Interestingly, we found that the maize CAenzyme genes have two and three functional units, respec-tively (Figure 3de; Additional data file 4), implying furtherDNA sequence duplication and gene fusion in the maize line-age. Mutation of stop codons was also found in the leadinggene sequences. Rice and Arabidopsis genes have only onefunctional unit preceded by a cTP.

To clarify the evolution of CA genes, we performed a phyloge-netic analysis of the functional units (Figure 2f). The firstfunctional units from sorghum and maize genes are groupedtogether, the second and third maize units and that ofSb03g029180 were in another group, and the rice gene andnon-C4 sorghum gene Sb03g029190 were outgroups. Thissuggests the origin of the extra functional units to be after thePanicoideae-Ehrhartoideae divergence but before the sor-ghum-maize divergence, and continuing in the maize lineage.A possible evolutionary process in sorghum is illustrated inFigure 4b.

A gene tree of functional units suggested that C4 CA genesmay have been affected by positive selection. According to thefree-parameter model of the maximum likelihood approach,we found that the two functional unit groups revealed abovemay have experienced positive selection, in that Ka/Ks > 1(Figure 2f), though this possibility is not significantly sup-ported by statistical tests or by amino acid substitution anal-ysis (Table S4 in Additional data file 1).

Excepting the possibly pseudogenized sorghum CA gene, thegrass isoforms have very high GC content (GC3 content 82 to92%), much higher than that of the Arabidopsis orthologs(Table S3 in Additional data file 1). The non-C4 gene,Sb03g029190, rather than any of the C4 genes, has the high-est GC content in sorghum.

DiscussionGene duplication and C4 pathway evolutionIn the case of the C4 pathway, the evolution of a novel biolog-ical pathway required the availability of gene families withmultiple members, in which modification of both expressionpatterns and functional domains led to new adaptive pheno-types. An intuitive idea is that genetic novelty formation issimplified by exploiting available 'construction bricks', andthe pathway genes that we are aware of were either 'sub-verted' from existing functions or were created through mod-ification of existing genes. Three mechanisms of new geneformation have been proposed [48]: duplication of pre-exist-ing genes followed by neofunctionalization; creation ofmosaic genes from parts of other genes; and de novo inven-tion of genes from DNA sequences.

Duplicated genes have long been suggested to contribute tothe evolution of new biological functions. As early as 1932,Haldane suggested that gene duplication events might havecontributed new genetic materials because they create ini-tially identical copies of genes, which could be altered later toproduce new genes without disadvantage to the organism[49]. Ohno proposed that gene duplication played an essen-tial role in evolution [50], pointed out the importance thatWGD might have had on speciation, and hypothesized that atleast one WGD event facilitated the evolution of vertebrates[51]. This hypothesis has been supported by evidence fromvarious gene families, and from the whole genome sequencesof several metazoans [52,53]. Plant genomes have experi-enced recurring WGDs [15,54-57], and perhaps allangiosperms are ancient polyploids [54]. These polyploidyevents contribute to the creation of important developmentaland regulatory genes [58-61], and may have played an impor-tant role in the origin and diversification of the angiosperms[62]. About 20 million years before the divergence of themajor grass clades [19,20], the ancestral grass genome wasaffected by a WGD, possibly preceded by still more ancientduplication events [17,63]. It is tempting to link this WGD tothe evolutionary success of grasses, now including more than

Genome Biology 2009, 10:R68

Page 11: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.11

10,000 species and covering about 20% of the Earth's landsurface [64], though such a link has not yet been adequatelyjustified.

Gene duplication has been related to the evolution of the C4pathway, based on the finding that C4 enzyme genes are usu-ally from families having multiple copies [14]. Consequently,an ability to create and maintain large numbers of duplicatedgenes has been supposed to be one precondition for certaintaxa to develop C4 photosynthesis [6,14]. It was even sug-gested that evolution of the C4 pathway was largely a story ofgene duplication while plants were still in the ancestral C3state [14].

Different genes in the C4 pathway were affected in differentways and at different times by gene duplication. Firstly, theapproximately 70-mya pan-cereal WGD enriched the reser-voir of some genes but not others. For example, in sorghum,both duplicated copies were preserved for PEPC and NADP-ME genes, and one of the copies of each gene produced byWGD was later recruited into the C4 pathway. This findinghighlights the contribution of WGD to the evolution of C4photosynthesis. However, for NADP-MDH, CA, PPDK-RPand PPCK enzyme genes, one of the WGD duplicates wasprobably lost. For CA, NADP-ME, and PPDK-RP, tandemgains of new genes after the sorghum-rice divergence appearsto have preceded C4 evolution. This seems to suggest that ear-lier availability of the pan-cereal duplicated copies was not byitself sufficient to initiate C4 evolution, although it is not clearwhether what was lacking was genetic (a part of the machin-ery) or environmental (a sufficiently strong selective advan-tage to drive the transition).

Adaptive evolution of C4 genesAfter duplication, there is evidence that some C4 genes expe-rienced adaptive evolution; however, selection pressures andevolutionary modes have varied. Both maximum likelihoodinference and patterns of aggregated amino acid differencesindicate that the C4 NADP-ME and PPDK-RP enzyme geneshave been under strong selective pressure. Maximum likeli-hood inference also implies that CA C4 enzymes have experi-enced positive selection, while aggregated amino aciddifferences indicate that C4 PEPC and PPDK genes may alsohave been under positive selection. The sorghum C4 genes ofPPCK and NADP-MDH enzymes have also accumulated moresubstitutions than their rice orthologs, though the differenceis not statistically significant. Compared to their riceorthologs, PEPC and NADP-MDH C4 genes evolve at a fasterrate, providing further evidence of adaptation.

In many cases (NADP-ME, CA, PEPC) evidence from C4plants supports adaptive evolution of the C4 gene familymembers only - the non-C4 homologs in C4 plants show noevidence of adaptive evolution, although the PEPC gene doesshow evidence in rice (C3). Further, the strongest evidence ofadaptive evolution is in the period when the C4 pathway is

thought to have evolved, after the divergence of sorghum andrice, but before the divergence of sorghum and maize.

Adaptive evolution is further supported by patterns of geneexpression shown in previous reports. PEPC, PPDK, and CAC4 genes are expressed approximately 20 times more in sor-ghum mesophyll than bundle-sheath cells, while NADP-MEC4 genes are expressed much more in bundle-sheath thanmesophyll cells [35]. The study of Flaveria intermediatesshows that PEPC activity is increased approximately 40 timesfrom C3 to full C4 species [65], and the NADP-ME activity isapproximately 9 times higher in veins than mesophyll cells[66].

During the process of adaptive evolution, a duplicated genemay gradually acquire a new function (neofunctionalization)or subdivide the functions of its progenitor with the otherduplicated copy (subfunctionalization). Laboratory evolutionexperiments indicated that an evolving new gene can initiallyacquire increased fitness for a new function without losing itsoriginal function [67]. This implies that a neofunctionaliza-tion process may begin with an initial subfunctionalizationstep, an implication that has been supported by theory [68].It is unclear how long such a step may take. Here, we foundthat neofunctionalization of C3 genes to function in C4 pho-tosynthesis could take a long time. Previous publicationsfound that both C4 and non-C4 sorghum NADP-MDH geneswere expressed in green leaves, though the C4 gene hadhigher transcript accumulation [43,44]. Together with maxi-mum likelihood analysis involving more genes and differentgrasses, this finding indicated that C4 and non-C4 sorghumNADP-MDH genes, produced before sorghum-Vetiveriadivergence, have experienced subfunctionalization [44].Sequence alignment here indicates that the sorghum non-C4gene has been affected by three insertions and one deletion inits amino-terminal coding sequence, suggesting functionalinnovation. Regardless of whether the process is a division offunctions of their progenitor gene, or evolution of a novelfunction in the non-C4 gene, co-expression, albeit at diver-gent levels, of the two genes in green leaves suggests that theprocess may not yet be finished.

In addition to the possible sheltering effect of a duplicatedcopy when evolving genetic novelty, alternative splicing mayfurther shelter functional changes. The maize PPDK gene(and probably also its sorghum ortholog) encoding C4 tran-scripts also encodes cytosolic transcripts. Their rice homologalso has a dual promoter [69], implying that natural selectionmay have utilized this pre-existing functional duality toevolve C4 function. If C4 transcripts are products of a novelfunction, and non-C4 transcripts due to the original function,the genes may have retained the original function for millionsof years while evolving a novel function. The state of bifunc-tionality may continue until possible genetic incompatibility,if any, accumulates to a point intolerable to fitness. MaizePPDK may not be the only case of such gene bifunctionality in

Genome Biology 2009, 10:R68

Page 12: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.12

the C4 pathway. As shown above, the sorghum CA gene,Sb03g029170, seems to have similar bifunctionality, encod-ing both C4 and non-C4 transcripts. Since its internallyrepeating structure may have been produced before sor-ghum-maize divergence, its maize homologs also share thisbifunctionality, which may have existed for millions of years.These multiple cases in which alternative splicing may con-tribute a possible sheltering effect during evolution of newfunction by C4 genes imply that it (alternative splicing) mayparticipate in other cases of evolution of genetic novelty.

We found that the sorghum and maize C4 PEPC genes are ona long branch of their gene tree, grouped together with theirsuspected rice ortholog, showing possible adaptive evolutionbased on both a high Ka/Ks ratio and elevated amino acidsubstitution. It is intriguing to ask whether possible adaptiveevolution in the rice PEPC gene could be a foundation towarda new origin of the C4 pathway, or instead indicates non-C4functional adaptation. Scrutiny of the rice PEPC sequencerevealed only 2 of 12 amino acid substitutions that were pre-viously inferred to be positively selected in C4 genes [42],and, in particular, it lacks the critical fixed mutation S780that is shared by C4 PEPC genes in other angiosperms[41,65]. This rice gene was classified into the ppc-B1 group[42], found only in the C3 grasses, suggesting that its adaptiveevolution is not leading to C4 photosynthesis, but possibly toother functional novelty.

Adaptive evolution of PEPC may have some valuable implica-tions for the discovery of multiple groups of PEPC genesdefined previously [42]. In some C4 grasses there are differ-ent groups of genes, such as ppc-B2 and ppc-C4; whileanother group of genes, ppc-B1, are found in only C3 grasses.These findings show that, in the C4 lineages after their diver-gence from the C3 lineages but perhaps prior to the evolutionof the C4 pathway itself, further gene duplication(s) may havecontributed to the establishment of C4 photosynthesis.

A novel mode of gene evolutionThe CA enzyme genes display a novel mode of gene evolution

and functional adaptation. As shown above, sorghum and

maize C4 CA enzymes have one, two or three functional

domains, produced through recursive duplications followed

by a fusion process involving stop codon mutations in the

leading domains. There have been at least four tandem dupli-

cation events in sorghum and its ancestral genomes. These

tandem duplications started before sorghum-maize diver-

gence, and appear to have continued in the maize lineage. The

recurrence of tandem duplications together with the subse-

quent merger process may have acted as a mode of adaptive

evolution. The present CA enzymes are beta-type, comprising

a dimer having four zinc ions bound to the structure as active

sites. Besides dimers, these enzymes can form tetramers, hex-

amers or octamers [47], suggesting that the dimer may be a

building block. Recruiting extra domains through tandem

duplication may contribute to the formation of more complex

structures, with more functional binding sites making them

work more efficiently to stabilize the balance between CO2

and . The expanded gene structure of these sorghum

and maize CA genes are unusual, since the cDNAs of Uroch-

loa paniculata and Flaveria bidentis, both C4 plants, are nor-

mal in size [70]. Nonetheless, there is precedent for internal

repetition of CA gene structure in red algae, Porphyridium

purpureum, resulting in two sets of functional binding sites

[71]. This independent evolution of internally repeating struc-

ture in CA genes supports our hypothesis that such structure

may confer functional advantages.

We found that the sorghum and maize C4 CA genes share acTP, which had not been expected since the enzymes were notfound to be chloroplastically localized in C4 plants. In C3plants, the most abundant CA activity is in the chloroplaststroma, while in C4 plants, the exact location of CA is lessclear [47], but the most abundant CA activity is localized inthe cytosol of mesophyll cells [72]. The cTP of sorghum andmaize C4 CA genes is similar to that of the Arabidopsis CAgenes, suggesting its existence before monocot-dicot diver-gence. The preservation of a cTP in C4 genes for tens of mil-lions of years cannot be explained as a mere relic but suggestspossible multiple functionality. This inference is at least par-tially supported by the discovery of divergent functionsimplemented by two different transcripts produced by thesingle sorghum C4 CA gene, Sb03g029170. As shown above,the expression of the longer transcript is light-inducible,while that of the shorter one is not, indicating that the longerbut not the shorter transcript may be involved in the C4 path-way.

A long transition time from C3 to C4 photosynthesisSeveral evolutionary models have been proposed to explainthe formation of the C4 pathway [73-75]. In summary, sevensignificant phases are recognized toward successful establish-ment of C4 photosynthesis: general preconditioning (forexample, gene duplication); anatomical preconditioning (forexample, close veins); enhancement of bundle-sheathorganelles; establishment of photorespiratory CO2 pump andtransformation of glycine decarboxylase to bundle-sheathcells; enhancement of PEPC activity; integration; and optimi-zation [6]. Although many biological and anatomical changesare needed, multiple origins in tens of angiosperm familiessuggest that it is not so difficult to evolve a novel C4 pathway.However, from an evolutionary viewpoint it is still interestingto ask whether a transition process of gene functional changesand/or enhancement is necessary before the final establish-ment, and how long such a transition might take. There hasbeen a long time-lag between the initial decrease in CO2 con-centration and the appearance of C4 plants. The initialdecrease in CO2 concentration started at least 100 mya [6],while molecular clock analyses suggest that the earliest C4

HCO3−

Genome Biology 2009, 10:R68

Page 13: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.13

plants (grasses) appeared about 24 to 35 mya [28,29] Oneproposed explanation for the time-lag was the lack of a suffi-cient reservoir of duplicated and neofunctionalized C3 genesto support C4 evolution [14]. Here, we found that the genes ofkey enzymes, such as PEPC and NADP-ME, were among theduplicated copies produced by the WGD approximately 70mya [19,20]. Once again we note, however, that availability ofthe pan-cereal duplicated copies was not by itself sufficient toinitiate C4 evolution, since some were lost from the commoncereal ancestor and then, after divergence from rice, had toreduplicate in the sorghum-maize ancestor before C4 evolu-tion could occur.

Differential duplicability of C4 genes and their non-C4 isoformsThe above characterization of gene duplication shows differ-ential duplicability of C4 genes and their isoforms in grasses.Evidence from yeast indicates that gene redundancy tends tobe preserved among some of the central proteins in the cellu-lar interaction network [76]. Tens of plant genes were sug-gested to be duplication-resistant, and undergo convergentrestoration to singleton status following several independentgenome duplications [25]. The differential duplicability couldbe explained by gene dosage effects, organismal complexity,protein interaction centrality and protein domain preference[24-26,76]. Here, we have shown that some gene families,including PEPC, PPCK, CA, and NADP-ME genes, have beenexpanded by gene duplication, but not others such as PPDKgenes. The families expanded by gene duplication tend to bemultiply functional, such as PEPC and NADP-ME [14]. Dif-ferent PEPC gene isoforms take on specific roles, includingthe regulation of ion balance, the production of amino-groupacceptor molecules in symbiotic nitrogen fixation, and theinitial fixation of C in C4 photosynthesis and Crassulaceanacid metabolism [77]. NADP-ME catalyzes the oxidativebreakdown of malate to form CO2 and pyruvate in the C4pathway. Its non-C4 functions include the provision of car-bon skeletons for ammonia assimilation [78] and reductantfor wound-induced production of lignin and flavonoids[79,80]. CA genes are also prone to duplication, which mayenhance their ability to form more complex structures, as dis-cussed above. Though further duplication is not requiredwhen a former C3 gene is finally co-opted for C4 roles [14], wefound that the sorghum NADP-MDH C4 gene did experiencea tandem duplication event, with only one duplicated copypreserving the C4 function through possible subfunctionali-zation [44]. This implies that the sorghum NADP-MDH C4gene itself may be duplication-resistant.

C4 pathway and codon usage biasGC content elevation has resulted in codon usage bias [37],which has been hypothesized by some to have contributed toC4 adaptive evolution [30]. As shown above, though the grassC4 genes and their isoforms always have a higher GC contentthan their Arabidopsis counterparts, there is often a non-C4grass gene having higher GC content than the C4 one(s).

Thus, there is no clear evidence supporting co-variationbetween codon usage bias and C4 gene evolution. Base com-position variation in grass genes has been a hot topic involv-ing transcription, translation, modification and mutationalbias [81-83].

Potential contribution to engineering new C4 plantsA comprehensive characterization of the C4 pathway will helpnot only to understand how C4 photosynthesis evolves, butalso may benefit crop improvement efforts. Of singular rele-vance are efforts to transform C3 plants into C4 plants. Toperform such a transformation, one strategy is to incorporatethe C4 pathway into C3 plants through recombinant DNAtechnology [84]. The strategy succeeds in transferring C4genes into C3 plants and yielding high levels of C4 enzymes indesired locations [85,86]. It is of great interest to transformrice, a staple food for more than half of the world's popula-tion, to perform C4 function, as reviewed in a recent publica-tion [31]. However, combined overproduction of C4 enzymes(PEPC, PPDK, NADP-ME, and NADP-MDH) resulted in onlyslightly higher levels of CO2 assimilation than in wild-typerice [87]. This might indicate that not all components neededfor C4 photosynthesis are known. There must be some trans-porters involved and there might also be some unknown reg-ulatory factors. Knowledge of the complete sorghum genomemight help to identify such components. As also shown above,though often not statistically significant, the sorghum andmaize C4 genes appear to have been under adaptive evolutionin different modes and levels, and show different duplicabil-ity. These findings may provide clues toward a successfultransformation to C4 photosynthesis. Alternatively, perhapsadaptations such as we have suggested in the PEPC gene in C3lineages have mitigated the perceived weaknesses of C3 pho-tosynthesis.

Subtle differences in the C4 pathways used in differentgrasses are worthy of further investigation as well. For exam-ple, if our hypothesis is correct that internally repeating struc-ture in CA genes may confer functional advantages, thenengineering of the maize trimer into sorghum (for example)may be advantageous. Likewise, exploration of still morerecent polyploids such as sugarcane might yield even morecomplex CA alleles. Tandem duplication of C4 NADP-MDHfollowing the sorghum-maize divergence does not appear tohave been essential to C4 evolution; indeed, one of the tan-dem genes appears to have lost C4 specificity. However, care-ful scrutiny of the physiological consequences of this changemight suggest benefits that could be transferred to othercrops.

ConclusionsGene duplication and C4 pathway evolutionBoth WGD and single-gene duplication have contributed toC4 pathway evolution in sorghum and maize. Some C4 genes(PEPC, PPCK, and NADP-ME C4 genes) were recruited from

Genome Biology 2009, 10:R68

Page 14: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.14

duplicates produced by WGD. Sorghum NADP-MDH, NADP-ME and PPDK-RP C4 genes were affected by tandem duplica-tion, with only one of the resulting copies involved in the C4pathway. C4 genes show divergent duplicability. PEPC,NADP-ME, PPCK, and CA gene families were expanded byrecursive duplication events, showing a duplication-philicnature, whereas NADP-MDH and PPDK are likely duplica-tion-phobic. Further supporting evidence is that only onecopy of NADP-MDH C4 gene duplicates preserves the C4function.

Adaptive evolution divergent in mode and levelWe found evidence of adaptive evolution of most C4 genesstudied. However, the mode and level of adaptation is diver-gent among C4 genes. Adaptive evolution is achieved thoughrapid mutations in DNA sequences, aggregated amino acidsubstitutions, and/or considerable increases of expressionlevels in specific cells. Besides gene redundancy, we foundthat alternative splicing may have also sheltered the evolutionof new function. Our analysis supports previous findings thatmaximum likelihood inference may be too conservative tofind adaptive evolution. We found no evidence of co-variationbetween codon usage bias and C4 pathway development.

Special evolutionary mode of grass CA genesGrass CA genes have evolved in a specific pattern featuringrecursive tandem duplication and neighboring gene fusion,which produced distinct isoforms having one to three func-tional units. Two sorghum C4 CA genes have one and twofunctional units, while two characterized maize C4 CA geneshave two and three functional units, respectively. The elonga-tion of these genes by recruiting extra domains may contrib-ute to the formation of more complex protein structures, asoften observed in plants.

A long transition time from C3 to C4 photosynthesisThe hypothesis that a reservoir of duplicated genes in ances-tral C3 plants was a prerequisite for C4 pathway developmentis only partially supported by present findings that some C4genes were recruited from the duplicates. Availability of thepan-cereal duplicated copies was not sufficient to initiate C4evolution, since some were lost from the common cerealancestor, then had to reduplicate in the sorghum-maizeancestor before C4 evolution could occur. However, C4 geneisoforms show quite divergent duplicability, and there hasbeen quite a long time-lag between the gene duplicationevents and the appearance of C4 grasses. These findings sug-gest a long transition process, including different modes offunctional innovation, before the eventual establishment ofC4 photosynthesis.

Materials and methodsKnown C4 enzyme genes and their non-C4 isoforms in sor-ghum, maize and Arabidopsis (Table 1) were downloadedfrom NCBI CoreNucleotide database [88]. Searching these

known genes against sorghum [89] and rice [90] gene modelsby running BLAST [91] (E-value < 1 × 10-5), we identifiedother enzyme genes in these organisms. By characterizingsequence similarity and constructing gene trees, possible C4genes were determined. The enzymes revealed here werelinked to expression data reported previously [35] by compar-ing cDNA segments to gene sequences using BLAST.

Gene colinearity inferenceThe potential gene homology information defined by runningBLAST was used as the input for MCscan [92] to find homol-ogous gene pairs in colinearity. The built-in scoring schemefor MCscan is min(-log10E_value, 50) for every matchinggene pair and -1 for each 10 kb distance between anchors, andblocks that had scores >300 were kept. The resulting syntenicchains were evaluated using a procedure by ColinearScan [17]and an E-value < 1 × 10-10 was used as a significance cutoff.

Gene phylogeny constructionWe constructed phylogenetic trees using several approaches,including the neighbor-joining, maximum likelihood, mini-mal evolution, and maximum parsimony methods, imple-mented in NADP-MEGA [93], PHYML [94], and PHYLIP[95], on both DNA and protein sequences. While runningPHYML, parameters were set as adopted previously [42].Bootstrap tests were performed with 100 repeats to producepercentage values, showing the stability of their topology. Thetrees mostly agree with one another. When there is inconsist-ency, the tree most strongly supported by bootstrap valueswas adopted for the subsequent adaptive evolution inference.For example, the trees of CA functional units were inconsist-ent among methods, and the best-supported neighbor-join-ing tree produced by protein sequences was adopted forfurther analysis.

Maximum likelihood inference of adaptive evolutionThe tree constructed for the group of C4 enzyme genes andtheir non-C4 isoforms was used to perform further maximumlikelihood analysis using the Codeml program in PAML [96].To detect whether a specific C4 gene had been positivelyselected, we compared two types of competing models: a free-ratio model and a ratio-restriction model [97]. The free-ratiomodel assumes an independent Ka/Ks ratio for each branch,whereas the latter forces the Ka/Ks ratio to be 1 on the specificbranch to be tested for positive selection, and for the otherbranches assumes independent ratios. Each model will pro-duce a likelihood, and the twofold difference between themfollows a Chi-squared distribution with 1 degree of freedom.

Aggregated amino acid substitution analysisWe adopted a comparative genomic approach initially pro-

posed by Wagner [98] to detect genes potentially under posi-

tive selection. The Wagner approach inferred positive

selection pressure by detecting possible aggregation of amino

acid replacement. Here, we inferred possible amino acid

replacements by comparing the homologous enzyme gene

Genome Biology 2009, 10:R68

Page 15: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.15

pair containing a C4 gene and non-C4 gene (often a rice gene)

against the aligned outgroup sequence. A replacement site is

identified in the C4 sequence that differs from the corre-

sponding sites in both the homologous sequence and the out-

group sequence, which are identical. We found the number of

all replacements, m, grouping the C4 and non-C4 protein

sequences. If the occurrences of these replacement sites are

assumed to be Poisson distributed with a parameter λ, we

may evaluate the chance of observing a specific number of

consecutive replacement sites along a sequence. For simplic-

ity in description, for each sequence we first defined a

replacement position array, x = x0, x1, x2,..., xt, xt+1, composed

by all the positions xi (1 ≤ i ≤ t) of replacement sites and two

ends of the sequence, that is, x0 = 0 and xt+1 = n - 1, where n is

the length of the alignment after purging gaps. Then we

defined the replacement distance array, d = (d1,... dt+1), where

d1 = xi - xi-1 (1 ≤ i ≤ t + 1). The distance between two replace-

ment sites di, k = xi+k-1 - xi, where k is the number of the con-

secutive replacement sites in the corresponding sequence

segment, follows a Pearson type III distribution following a

probability density λ(λz)k-2 e-λz/Γ(k - 1) [99], where Γ(k - 1) =

(k - 1)!. We can estimate the Poisson parameter λ with m/

(2n). Supposing there are ti replacement sites along the i-th

sequence, obviously, we get . Therefore, we could

estimate the probability P(di, k) that k consecutive replace-

ments in a distance between two replacement sites is smaller

than the observed di, k by the following integration:

We evaluated the occurrence probability of observed distance

between any two replacement sites, and the smallest proba-

bility was used to locate a region with the most aggregated

replacements, which was taken to be significant after a Bon-

ferroni correction by considering the number of all combina-

tions of replacement sites ( ). The occurrence

probability was calculated using R [100]. If between two

replacement sites there were gaps in the aligned sequences,

they were omitted to check for possible selection. We com-

posed Perl scripts to implement the described approach.

Maize homolog characterizationMaize BACs are from the MaizeSequence database [101]. Themaize genes were searched against the BAC sequences toreveal their chromosomal locations, local DNA structures,and so on.

Chloroplastic transit peptide inferrenceChloroP1.1 [102] was used to predict the presence of cTPs inthe enzyme protein sequences and the location of potentialcTP cleavage sites

DotplottingDotplots between CA protein sequences were produced byrunning the public program DOTTER [103]. The Dotplotswere produced by matched strings from two proteinsequences in comparison. The expected score per residue ofthe matched strings was set to be 40.

AbbreviationsBAC: bacterial artificial chromosome; CA: carboxylatinganhydrase; cTP: chloroplast transit peptide; MDH: malatedehydrogenase; mya: million years ago; NADP-ME: NADP-malic enzyme; PEPC: phosphoenolpyruvate carboxylase;PPCK: PEPC kinase; PPDK: pyruvate orthophosphate diki-nase; PPDK-RP: PPDK regulatory protein; WGD: whole-genome duplication.

Authors' contributionsXW designed and organized the present work. UG, PW andXW curated gene models. UG, HT, JEB and AHP contributedto this work through critical discussion. XW and AHP wrotethe paper.

Additional filesThe following additional data are available with the onlineversion of this paper: Tables S1 to S4 (Additional data file 1);a full tree of PEPC genes (Additional data file 2); cTPsdetected by ChloroP (Additional data file 3); CA genesequences and their functional units (Additional data file 4).Additional File 1Tables S1 to S4Table S1: C4 gene isoforms in the present study. Table S2: sorghum C4 genes with cDNA evidence; Table S3: GC content of C4 gene iso-forms. Table S4: complete results of inferred amino acid substitu-tions and their aggregation.Click here for fileAdditional File 2Full tree of PEPC genesFull tree of PEPC genes.Click here for fileAdditional File 3cTPs detected by ChloroPcTPs detected by ChloroP.Click here for fileAdditional File 4CA gene sequences and their functional unitsCA gene sequences and their functional units.Click here for file

AcknowledgementsWe appreciate financial support from the US National Science Foundation(MCB-0450260 to AHP). We thank Lifeng Lin for artwork.

References1. Hatch MD, Slack CR: Photosynthesis by sugar-cane leaves. A

new carboxylation reaction and the pathway of sugar forma-tion. Biochem J 1966, 101(1):103-111.

2. Seemann JR, Sharkey TD, Wang J, Osmond CB: EnvironmentalEffects on Photosynthesis, Nitrogen-Use Efficiency, andMetabolite Pools in Leaves of Sun and Shade Plants. Plantphysiology 1987, 84(3):796-802.

3. Hattersley PG: The distribution of C3 and C4 grasses in Aus-tralia in relation to climate. Oecologia 1983, 57:113-128.

4. Ehleringer JR, Bjorkman O: A Comparison of PhotosyntheticCharacteristics of Encelia Species Possessing Glabrous andPubescent Leaves. Plant physiology 1978, 62(2):185-190.

5. Cerling TE, Harris JM, MacFadden BJ, Leasey MG, Quade J, EisenmannV, Ehleringer JR: Global vegetation change through theMiocene/Pliocene boundary. Nature 1997, 389:153-158.

6. Sage RF: The evolution of C4 photosynthesis. New Phytologist2004, 161:341-370.

7. Mulhaidat R, Sage RF, Dengler NG: Diversity of kranz anatomy

t mii=∑ =

1

2

P dk

z e dzi kk z

di k

( )( )

( ) .,

,

=−

− −∫λ λ λΓ 1

2

0

t +⎛

⎝⎜

⎠⎟

2

2

Genome Biology 2009, 10:R68

Page 16: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.16

and biochemistry in C4 eudicots. American Journal of Botany 2007,94(3):20.

8. Giussani LM, Cota-Sanchez JH, Zuloaga FO, Kellogg EA: A molecu-lar phylogeny of the grass subfamily Panicoideae (Poaceae)shows multiple origins of C4 photosynthesis. American Journalof Botany 2001, 88(11):1993-2012.

9. Pyankov VI, Artyusheva EG, Edwards GE, Black CC Jr, Soltis PS: Phy-logenetic analysis of tribe Salsoleae (Chenopodiaceae) basedon ribosomal ITS sequences: implications for the evolutionof photosynthesis types. Am J Bot 2001, 88(7):1189-1198.

10. Sheen J: C4 Gene Expression. Annu Rev Plant Physiol Plant Mol Biol1999, 50:187-217.

11. Burnell JN, Chastain CJ: Cloning and expression of maize-leafpyruvate, Pi dikinase regulatory protein gene. Biochem BiophysRes Commun 2006, 345(2):675-680.

12. Kawamura T, Shigesada K, Toh H, Okumura S, Yanagisawa S, Izui K:Molecular evolution of phosphoenolpyruvate carboxylase forC4 photosynthesis in maize: comparison of its cDNAsequence with a newly isolated cDNA encoding an isozymeinvolved in the anaplerotic function. J Biochem 1992,112(1):147-154.

13. Poetsch W, Hermans J, Westhoff P: Multiple cDNAs of phos-phoenolpyruvate carboxylase in the C4 dicot Flaveria trin-ervia. FEBS Letters 1991, 292:133-136.

14. Monson RK: Gene duplication, neofunctionalization, and theevolution of C4 photosynthesis. International Journal of Plant Sci-ence 2003, 164(6920):S43-S54.

15. Bowers JE, Chapman BA, Rong J, Paterson AH: Unravellingangiosperm genome evolution by phylogenetic analysis ofchromosomal duplication events. Nature 2003,422(6930):433-438.

16. Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C,Zhang J, Zhang Y, Li R, Xu Z, Li S, Li X, Zheng H, Cong L, Lin L, YinJ, Geng J, Li G, Shi J, Liu J, Lv H, Li J, Wang J, Deng Y, Ran L, Shi X, etal.: The Genomes of Oryza sativa: A History of Duplications.PLoS Biology 2005, 3(2):e38.

17. Wang X, Shi X, Li Z, Zhu Q, Kong L, Tang W, Ge S, Luo J: Statisticalinference of chromosomal homology based on gene coline-arity and applications to Arabidopsis and rice. BMC bioinfor-matics 2006, 7(1):447.

18. Blanc G, Wolfe KH: Widespread paleopolyploidy in modelplant species inferred from age distributions of duplicategenes. The Plant cell 2004, 16(7):1667-1678.

19. Paterson AH, Bowers JE, Chapman BA: Ancient polyploidizationpredating divergence of the cereals, and its consequences forcomparative genomics. Proc Natl Acad Sci USA 2004,101(26):9903-9908.

20. Wang X, Shi X, Hao B, Ge S, Luo J: Duplication and DNA seg-mental loss in the rice genome: implications for diploidiza-tion. New Phytologist 2005, 165(3):937-946.

21. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J,Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J,Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, FeltusFA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, NarechaniaA, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, CarpitaNC, et al.: The Sorghum bicolor genome and the diversifica-tion of grasses. Nature 2009, 457(7229):551-556.

22. Lynch M, Conery JS: The evolutionary demography of duplicategenes. J Struct Funct Genomics 2003, 3(1-4):35-44.

23. He X, Zhang J: Gene complexity and gene duplicability. CurrBiol 2005, 15(11):1016-1021.

24. Liang H, Li WH: Gene essentiality, gene duplicability and pro-tein connectivity in human and mouse. Trends Genet 2007,23(8):375-378.

25. Paterson AH, Chapman BA, Kissinger JC, Bowers JE, Feltus FA, EstillJC: Many gene and domain families have convergent fates fol-lowing independent whole-genome duplication events inArabidopsis, Oryza, Saccharomyces and Tetraodon. TrendsGenet 2006, 22(11):597-602.

26. Papp B, Pal C, Hurst LD: Dosage sensitivity and the evolution ofgene families in yeast. Nature 2003, 424(6945):194-197.

27. Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, HubiszMJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, J JS, AdamsMD, Cargill M: A scan for positively selected genes in thegenomes of humans and chimpanzees. PLoS biology 2005,3(6):e170.

28. Vicentini A, Barber JC, Aliscioni SS, Ciussani LM, Kellogg EA: The ageof the grasses and clusters of origins of C4 photosynthesis.

Global Change Biology 2008, 14:15.29. Christin PA, Besnard G, Samaritani E, Duvall MR, Hodkinson TR,

Savolainen V, Salamin N: Oligocene CO2 decline promoted C4photosynthesis in grasses. Curr Biol 2008, 18(1):37-43.

30. Shenton M, Fontaine V, Hartwell J, Marsh JT, Jenkins GI, Nimmo HG:Distinct patterns of control and expression amongst mem-bers of the PEP carboxylase kinase gene family in C4 plants.Plant J 2006, 48(1):45-53.

31. Sheehy JE, Mitchell PL, Hardy B: Charting New Pathways To C4Rice. Los Banos (philippines): World Scientific Publishing Company;2008.

32. Sanchez R, Cejudo FJ: Identification and expression analysis ofa gene encoding a bacterial-type phosphoenolpyruvate car-boxylase from Arabidopsis and rice. Plant physiology 2003,132(2):949-957.

33. Cretin C, Keryer E, Tagu D, Lepiniec L, Vidal J, Gadal P: CompletecDNA sequence of sorghum phosphoenolpyruvate carboxy-lase involved in C4 photosynthesis. Nucleic acids research 1990,18(3):658.

34. Cretin C, Santi S, Keryer E, Lepiniec L, Tagu D, Vidal J, Gadal P: Thephosphoenolpyruvate carboxylase gene family of Sorghum:promoter structures, amino acid sequences and expressionof genes. Gene 1991, 99(1):87-94.

35. Wyrich R, Dressen U, Brockmann S, Streubel M, Chang C, Qiang D,Paterson AH, Westhoff P: The molecular basis of C4 photosyn-thesis in sorghum: isolation, characterization and RFLPmapping of mesophyll- and bundle-sheath-specific cDNAsobtained by differential screening. Plant molecular biology 1998,37(2):319-335.

36. Song R, Llaca V, Messing J: Mosaic organization of orthologoussequences in grass genomes. Genome Res 2002,12(10):1549-1555.

37. Carels N, Bernardi G: Two classes of genes in plants. Genetics2000, 154(4):1819-1825.

38. Lepiniec L, Keryer E, Philippe H, Gadal P, Cretin C: Sorghum phos-phoenolpyruvate carboxylase gene family: structure, func-tion and molecular evolution. Plant molecular biology 1993,21(3):487-502.

39. Besnard G, Offmann B, Robert C, Rouch C, Cadet F: Assessmentof the C(4) phosphoenolpyruvate carboxylase gene diversityin grasses (Poaceae). Theor Appl Genet 2002, 105(2-3):404-412.

40. Roth C, Liberles DA: A systematic search for positive selectionin higher plants (Embryophytes). BMC plant biology 2006, 6:12.

41. Westhoff P, Gowik U: Evolution of c4 phosphoenolpyruvatecarboxylase. Genes and proteins: a case study with the genusFlaveria. Ann Bot (Lond) 2004, 93(1):13-23.

42. Christin PA, Salamin N, Savolainen V, Duvall MR, Besnard G: C4 Pho-tosynthesis evolved in grasses via parallel adaptive geneticchanges. Curr Biol 2007, 17(14):1241-1247.

43. Luchetta P, Cretin C, Gadal P: Organization and expression ofthe two homologous genes encoding the NADP-malatedehydrogenase in Sorghum vulgare leaves. Mol Gen Genet1991, 228(3):473-481.

44. Rondeau P, Rouch C, Besnard G: NADP-malate dehydrogenasegene evolution in Andropogoneae (Poaceae): gene duplica-tion followed by sub-functionalization. Ann Bot (Lond) 2005,96(7):1307-1314.

45. Sheen J: Molecular mechanisms underlying the differentialexpression of maize pyruvate, orthophosphate dikinasegenes. The Plant cell 1991, 3(3):225-245.

46. Glackin CA, Grula JW: Organ-specific transcripts of differentsize and abundance derive from the same pyruvate, ortho-phosphate dikinase gene in maize. Proceedings of the NationalAcademy of Sciences of the United States of America 1990,87(8):3004-3008.

47. Tiwari A, Kumar P, Singh S, Ansari S: Carbonic anhydrase in rela-tion to higher plants. Photosynthetica 2005, 43(1):1-11.

48. Wolfe KH, Li WH: Molecular evolution meets the genomicsrevolution. Nature genetics 2003, 33(Suppl):255-265.

49. Haldane JBS: The causes of evolution. Ithaca: Cornell UniversityPress; 1932.

50. Ohno S: Sex chromosomes and sex-linked genes. Berlin: Sprin-gler-Verlag; 1967.

51. Ohno S: Evolution by Gene Duplication. Berlin-Heidelberg_NewYork: Springer-Verlag; 1970.

52. Steinke D, Hoegg S, Brinkmann H, Meyer A: Three rounds (1R/2R/3R) of genome duplications and the evolution of the glyco-lytic pathway in vertebrates. BMC Biol 2006, 4:16.

Genome Biology 2009, 10:R68

Page 17: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.17

53. Meyer A, Peer Y Van de: From 2R to 3R: evidence for a fish-spe-cific genome duplication (FSGD). Bioessays 2005,27(9):937-945.

54. Soltis PS: Ancient and recent polyploidy in angiosperms. TheNew phytologist 2005, 166(1):5-8.

55. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A,Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, HugueneyP, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyere C, BillaultA, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V,Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, et al.: Thegrapevine genome sequence suggests ancestral hexaploidi-zation in major angiosperm phyla. Nature 2007,449(7161):463-467.

56. Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH: Syntenyand collinearity in plant genomes. Science 2008,320(5875):486-488.

57. Chapman BA, Bowers JE, Feltus FA, Paterson AH: Buffering crucialfunctions by paleologous duplicated genes may impart cycli-cality to angiosperm genome duplication. Proceedings of theNational Academy of Sciences of the United States of America 2006,103:2730-2735.

58. Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M,Peer Y Van de: Modeling gene and genome duplications ineukaryotes. Proceedings of the National Academy of Sciences of theUnited States of America 2005, 102(15):5454-5459.

59. Blanc G, Wolfe KH: Functional divergence of duplicated genesformed by polyploidy during Arabidopsis evolution. The Plantcell 2004, 16(7):1679-1691.

60. Seoighe C, Gehring C: Genome duplication led to highlyselec-tive expansion of the Arabidopsis thaliana proteome. TrendsGenet 2004, 20(10):461-464.

61. Freeling M, Thomas BC: Gene-balanced duplications, like tetra-ploidy, provide predictable drive to increase morphologicalcomplexity. Genome research 2006, 16(7):805-814.

62. De Bodt S, Maere S, Peer Y Van de: Genome duplication and theorigin of angiosperms. Trends in ecology & evolution (Personal edi-tion) 2005, 20(11):591-597.

63. Salse J, Bolot S, Throude M, Jouffe V, Piegu B, Quraishi UM, CalcagnoT, Cooke R, Delseny M, Feuillet C: Identification and characteri-zation of shared duplications between rice and wheat pro-vide new insight into grass genome evolution. The Plant cell2008, 20:11-24.

64. Shantz HL: The place of grasslands in the earth's cover of veg-etation. Ecology 1954, 35:143-145.

65. Svensson P, Blasing OE, Westhoff P: Evolution of C4 phosphoe-nolpyruvate carboxylase. Arch Biochem Biophys 2003,414(2):180-188.

66. Hibberd JM, Quick WP: Characteristics of C4 photosynthesis instems and petioles of C3 flowering plants. Nature 2002,415(6870):451-454.

67. Aharoni A, Gaidukov L, Khersonsky O, McQ GS, Roodveldt C, Taw-fik DS: The 'evolvability' of promiscuous protein functions.Nature genetics 2005, 37(1):73-76.

68. He XL, Zhang JZ: Rapid subfunctionalization accompanied byprolonged and substantial neofunctionalization in duplicategene evolution. Genetics 2005, 169(2):1157-1164.

69. Matsuoka M: The gene for pyruvate, orthophosphate dikinasein C4 plants: structure, regulation and evolution. Plant & cellphysiology 1995, 36(6):937-943.

70. Moroney JV, Bartlett SG, Samuelsson G: Carbonic anhydrases inplants and algae. Plant Cell and Environment 2001, 24:13.

71. Mitsuhashi S, Mizushima T, Yamashita E, Yamamoto M, Kumasaka T,Moriyama H, Ueki T, Miyachi S, Tsukihara T: X-ray structure ofbeta-carbonic anhydrase from the red alga, Porphyridiumpurpureum, reveals a novel catalytic site for CO(2) hydra-tion. The Journal of biological chemistry 2000, 275(8):5521-5526.

72. Ku MS, Kano-Murakami Y, Matsuoka M: Evolution and expressionof C4 photosynthesis genes. Plant physiology 1996,111(4):949-957.

73. Edwards GE, Ku MSB: Biochemistry of C3-C4 intermediates. InThe Biochemistry of Plants Volume 10. Edited by: Hatch MD, BoardmanNK. London: Academic Press; 1987:275-325.

74. Brown RH, Hattersley PW: Leaf Anatomy of C(3)-C(4) Speciesas Related to Evolution of C(4) Photosynthesis. Plant physiology1989, 91:1543-1550.

75. Rawsthorne S: Towards an understanding of C3-C4 photosyn-thesis. Essays Biochem 1992, 27:135-146.

76. Kafri R, Dahan O, Levy J, Pilpel Y: Preferential protection of pro-

tein interaction network hubs in yeast: evolved functionalityof genetic redundancy. Proceedings of the National Academy of Sci-ences of the United States of America 2008, 105(4):1243-1248.

77. Gehring HH, Heute V, Kluge M: Toward a better knowledge ofthe molecular evolution of phosphoenolpyruvate carboxy-lase by comparison of partial cDNA sequences. Journal ofmolecular evolution 1998, 46:107-114.

78. Chopra J, Kaur N, Gupta AK: A comparative developmentalpattern of enzymes of carbon metabolism and pentose phos-phate pathway in mungbean and lentil nodules. Acta PhysiolPlant 2002, 24:67-72.

79. Casati P, Drincovich MF, Edwards GE, Andreo CS: Malate metabo-lism by NADP-malic enzyme in plant defense. Photosynth Res1999, 61:99-105.

80. Maurino VG, Saigo M, Andreo CS, Drincovich MF: Non-photosyn-thetic 'malic enzyme' from maize: a constituvely expressedenzyme that responds to plant defence inducers. Plant molec-ular biology 2001, 45(4):409-420.

81. Wong GK, Wang J, Tao L, Tan J, Zhang J, Passey DA, Yu J: Compo-sitional gradients in Gramineae genes. Genome research 2002,12(6):851-856.

82. Wang HC, Singer GA, Hickey DA: Mutational bias affects proteinevolution in flowering plants. Mol Biol Evol 2004, 21(1):90-96.

83. Shi X, Wang X, Li Z, Zhu Q, Yang J, Ge S, Luo J: Evidence that nat-ural selection is the primary cause of the GC content varia-tion in rice genes. Journal of Integrative Plant Biology 2007.

84. Miyao M: Molecular evolution and genetic engineering of C4photosynthetic enzymes. J Exp Bot 2003, 54(381):179-189.

85. Ku MS, Agarie S, Nomura M, Fukayama H, Tsuchida H, Ono K, HiroseS, Toki S, Miyao M, Matsuoka M: High-level expression of maizephosphoenolpyruvate carboxylase in transgenic rice plants.Nat Biotechnol 1999, 17(1):76-80.

86. Fukayama H, Tsuchida H, Agarie S, Nomura M, Onodera H, Ono K,Lee BH, Hirose S, Toki S, Ku MS, Makino A, Matsuoka M, Miyao M:Significant accumulation of C(4)-specific pyruvate, ortho-phosphate dikinase in a C(3) plant, rice. Plant physiology 2001,127(3):1136-1146.

87. Taniguchi Y, Ohkawa H, Masumoto C, Fukuda T, Tamai T, Lee K,Sudoh S, Tsuchida H, Sasaki H, Fukayama H, Miyao M: Overproduc-tion of C4 photosynthetic enzymes in transgenic rice plants:an approach to introduce the C4-like photosynthetic path-way into rice. J Exp Bot 2008, 59(7):1799-1809.

88. NCBI CoreNucleotide database [http://www.ncbi.nlm.nih.gov/]89. Joint Genome Institute [http://www.jgi.doe.gov/]90. Rice annotation project 2 [http://rgp.dna.affrc.go.jp/E/index.html/

]91. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local

alignment search tool. Journal of molecular biology 1990,215:403-410.

92. Tang HB, Wang XY, Bowers JE, Ming R, Alam M, Paterson AH:Unreveling ancient hexaploidy throught multiply-alignedangiosperm gene maps. Genome research 2008.

93. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolu-tionary Genetics Analysis (MEGA) software version 4.0. MolBiol Evol 2007, 24(8):1596-1599.

94. Guindon S, Lethiec F, Duroux P, Gascuel O: PHYML Online--aweb server for fast maximum likelihood-based phylogeneticinference. Nucleic acids research 2005:W557-559.

95. Felsenstein J: Phylogenies From Restriction Sites - a Maxi-mum-Likelihood Approach. Evolution 1992, 46(1):159-173.

96. Yang Z, Nielsen R: Synonymous and nonsynonymous rate var-iation in nuclear genes of mammals. Journal of molecular evolution1998, 46(4):409-418.

97. Yang Z: Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution. Mol Biol Evol1998, 15:568-573.

98. Wagner A: Rapid detection of positive selection in genes andgenomes through variation clusters. Genetics 2007,176(4):2451-2463.

99. Wagner A: A computational genomics approach to the iden-tification of gene networks. Nucleic acids research 1997,25(18):3594-3604.

100. R language [http://www.r-project.org/]101. MaizeSequence [http://www.maizesequence.org/]102. Emanuelsson O, Nielsen H, von Heijne G: ChloroP, a neural net-

work-based method for predicting chloroplast transit pep-tides and their cleavage sites. Protein Sci 1999, 8(5):978-984.

Genome Biology 2009, 10:R68

Page 18: Comparative genomic analysis of C4 photosynthetic pathway … · 2017-04-11 · available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution

http://genomebiology.com/2009/10/6/R68 Genome Biology 2009, Volume 10, Issue 6, Article R68 Wang et al. R68.18

103. Sonnhammer ELL, Durbin R: A dot-matrix program withdynamic threshold control suitable for genomic DNA andprotein sequence analysis. Gene 1995, 167:1-10.

Genome Biology 2009, 10:R68