polyploidy-associated genome modifications during land...

11
rstb.royalsocietypublishing.org Review Cite this article: Jiao Y, Paterson AH. 2014 Polyploidy-associated genome modifications during land plant evolution. Phil. Trans. R. Soc. B 369: 20130355. http://dx.doi.org/10.1098/rstb.2013.0355 One contribution of 14 to a Theme Issue ‘Contemporary and future studies in plant speciation, morphological/floral evolution and polyploidy: honouring the scientific contributions of Leslie D. Gottlieb to plant evolutionary biology’. Subject Areas: evolution, genomics, plant science Keywords: genome modification, ancestral gene content, polyploidy, gene family gain and loss Author for correspondence: Andrew H. Paterson e-mail: [email protected] Electronic supplementary material is available at http://dx.doi.org/10.1098/rstb.2013.0355 or via http://rstb.royalsocietypublishing.org. Polyploidy-associated genome modifications during land plant evolution Yuannian Jiao and Andrew H. Paterson Plant Genome Mapping Laboratory, University of Georgia, 111 Riverbend Road, Athens, GA 30606, USA The occurrence of polyploidy in land plant evolution has led to an acceleration of genome modifications relative to other crown eukaryotes and is correlated with key innovations in plant evolution. Extensive genome resources provide for relating genomic changes to the origins of novel morphological and phys- iological features of plants. Ancestral gene contents for key nodes of the plant family tree are inferred. Pervasive polyploidy in angiosperms appears likely to be the major factor generating novel angiosperm genes and expanding some gene families. However, most gene families lose most duplicated copies in a quasi-neutral process, and a few families are actively selected for single- copy status. One of the great challenges of evolutionary genomics is to link genome modifications to speciation, diversification and the morphological and/or physiological innovations that collectively compose biodiversity. Rapid accumulation of genomic data and its ongoing investigation may greatly improve the resolution at which evolutionary approaches can contrib- ute to the identification of specific genes responsible for particular innovations. The resulting, more ‘particulate’ understanding of plant evolution, may elev- ate to a new level fundamental knowledge of botanical diversity, including economically important traits in the crop plants that sustain humanity. 1. Introduction Genome duplication is a punctuational event in the evolution of a lineage, with permanent consequences for all descendants—if the lineage survives. Most crown eukaryotes pass through different ploidy levels at different stages of development [1,2] and continuously produce aberrant unreduced gametes at low rates. However, the extreme rarity of genome duplications in the evolution- ary history of extant lineages, usually surviving only once in many millions of years, shows that the vast majority quickly go extinct [3]. Classical views suggest that genome duplication is potentially advan- tageous as a source of genes with new functions [4,5]. Some polyploids appear to realize these and other benefits [6], with genome duplication thought to be central to the evolution of morphological complexity [7]. Polyploids have long been suggested to enjoy a variety of capabilities that transgress those of their diploid progenitors, such as adaptation to environmental extremes [8– 11]. Angiosperms are an outstanding model in which to elucidate consequences of genome duplication in crown eukaryotes. It has long been suspected that many angiosperms were palaeopolyploids [10,12]. Indeed, recent analyses of genome sequences [13,14] show that all angiosperms are palaeopolyploids. Seminal find- ings from yeast [15–17] and Paramecium [18] are shedding valuable light on consequences of genome duplication in single-celled organisms. However, these consequences are expected to be very different in higher eukaryotes with small effective population sizes, such as angiosperms and mammals [19,20]. Herein, we describe some representative features of major land plant groups and associated genomic modifications and variations throughout the history of plant evolution. We circumscribe the ancestral gene content for key nodes of the evolutionary series, as well as adaptational genomic changes occurring along the branches leading to these nodes. We also review current approaches for genome structure comparison to unravel ancient polyploidy and discuss the consequences of these polyploidy events, particularly the biased pattern of gene gains and losses following genome duplication. & 2014 The Author(s) Published by the Royal Society. All rights reserved. on May 19, 2018 http://rstb.royalsocietypublishing.org/ Downloaded from

Upload: dinhdang

Post on 18-Mar-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

rstb.royalsocietypublishing.org

ReviewCite this article: Jiao Y, Paterson AH. 2014

Polyploidy-associated genome modifications

during land plant evolution. Phil. Trans. R. Soc.

B 369: 20130355.

http://dx.doi.org/10.1098/rstb.2013.0355

One contribution of 14 to a Theme Issue

‘Contemporary and future studies in plant

speciation, morphological/floral evolution

and polyploidy: honouring the scientific

contributions of Leslie D. Gottlieb to plant

evolutionary biology’.

Subject Areas:evolution, genomics, plant science

Keywords:genome modification, ancestral gene content,

polyploidy, gene family gain and loss

Author for correspondence:Andrew H. Paterson

e-mail: [email protected]

& 2014 The Author(s) Published by the Royal Society. All rights reserved.

Electronic supplementary material is available

at http://dx.doi.org/10.1098/rstb.2013.0355 or

via http://rstb.royalsocietypublishing.org.

Polyploidy-associated genomemodifications during land plant evolution

Yuannian Jiao and Andrew H. Paterson

Plant Genome Mapping Laboratory, University of Georgia, 111 Riverbend Road, Athens, GA 30606, USA

The occurrence of polyploidy in land plant evolution has led to an acceleration

of genome modifications relative to other crown eukaryotes and is correlated

with key innovations in plant evolution. Extensive genome resources provide

for relating genomic changes to the origins of novel morphological and phys-

iological features of plants. Ancestral gene contents for key nodes of the plant

family tree are inferred. Pervasive polyploidy in angiosperms appears likely to

be the major factor generating novel angiosperm genes and expanding some

gene families. However, most gene families lose most duplicated copies in

a quasi-neutral process, and a few families are actively selected for single-

copy status. One of the great challenges of evolutionary genomics is to link

genome modifications to speciation, diversification and the morphological

and/or physiological innovations that collectively compose biodiversity.

Rapid accumulation of genomic data and its ongoing investigation may

greatly improve the resolution at which evolutionary approaches can contrib-

ute to the identification of specific genes responsible for particular innovations.

The resulting, more ‘particulate’ understanding of plant evolution, may elev-

ate to a new level fundamental knowledge of botanical diversity, including

economically important traits in the crop plants that sustain humanity.

1. IntroductionGenome duplication is a punctuational event in the evolution of a lineage, with

permanent consequences for all descendants—if the lineage survives. Most

crown eukaryotes pass through different ploidy levels at different stages of

development [1,2] and continuously produce aberrant unreduced gametes at

low rates. However, the extreme rarity of genome duplications in the evolution-

ary history of extant lineages, usually surviving only once in many millions of

years, shows that the vast majority quickly go extinct [3].

Classical views suggest that genome duplication is potentially advan-

tageous as a source of genes with new functions [4,5]. Some polyploids

appear to realize these and other benefits [6], with genome duplication thought

to be central to the evolution of morphological complexity [7]. Polyploids have

long been suggested to enjoy a variety of capabilities that transgress those of

their diploid progenitors, such as adaptation to environmental extremes [8–11].

Angiosperms are an outstanding model in which to elucidate consequences of

genome duplication in crown eukaryotes. It has long been suspected that many

angiosperms were palaeopolyploids [10,12]. Indeed, recent analyses of genome

sequences [13,14] show that all angiosperms are palaeopolyploids. Seminal find-

ings from yeast [15–17] and Paramecium [18] are shedding valuable light on

consequences of genome duplication in single-celled organisms. However,

these consequences are expected to be very different in higher eukaryotes with

small effective population sizes, such as angiosperms and mammals [19,20].

Herein, we describe some representative features of major land plant groups

and associated genomic modifications and variations throughout the history

of plant evolution. We circumscribe the ancestral gene content for key nodes of

the evolutionary series, as well as adaptational genomic changes occurring

along the branches leading to these nodes. We also review current approaches

for genome structure comparison to unravel ancient polyploidy and discuss the

consequences of these polyploidy events, particularly the biased pattern of

gene gains and losses following genome duplication.

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

2

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

2. Genome resources in major plant groupsLand plants, also called embryophytes, descended from

freshwater algae about 480 Ma [21–24]. Most of the major

phylogenetic groups of land plants now have at least one

high-quality draft or reference genome sequence. Mosses,

with about 12 000 species classified in the Bryophyta, are

non-vascular plants that lack xylem and absorb water and

nutrients mainly through their leaves. To date, there is only

one Bryophyta species (Physcomitrella patens) with a completed

genome sequence [25]. Lycophytes, together with euphyllo-

phytes (ferns and seed plants), are the two surviving lineages

after the origin of vascular plants. Selaginella moellendorfii, in

the lycophyte family Selaginellaceae (spike mosses), is the

first vascular non-seed plant with its genome sequenced [26].

Seed plants (spermatophytes) include gymnosperms and

angiosperms. Gymnosperms are a group of seed-producing

plants; extant gymnosperms include conifers, cycads, Ginkgoand Gnetales. Many gymnosperms have exceptionally large

genomes; for example, conifer genome sizes range from 18 to

35 gigabases [27], hindering whole genome sequencing. Piceaabies is the first gymnosperm species with a full genome

sequence, completed in 2013 [28]. Angiosperms are by far the

largest group of land plants, with more than 300 000 living

species, of which at least 33 have sequenced genomes (as in

Phytozome v9.1), notably Arabidopsis [29], Populus [30], grape

[31], Oryza [32], Sorghum [33], banana [34] and many more,

with many additional genome sequences in progress. These

genome sequences from species across land plant phylogeny

provide critical resources to study genome evolution and

associated innovations of plant form and function.

3. Ancestral genome modification at keyevolutionary nodes

Clarification of the ancestral gene content and lineage-specific

variations at key nodes in the land plant phylogeny will

advance knowledge of how genomic modifications contribu-

ted to the evolution of novel features. Here, we present an

exemplar study of reconstructing ancestral gene content.

The protein-coding gene sets of 11 sequenced land plant

species (five eudicots: Arabidopsis thaliana, Populus trichocarpa,Vitis vinifera, Solanum lycopersicum and Nelumbo nucifera; three

monocots: Oryza sativa, Sorghum bicolor and Musa acuminata;

one gymnosperm: Picea abies; one lycophyte: Selaginellamoellendorffii; one moss: Physcomitrella patens) were used to

identify putative gene family clusters by OrthoMCL [35].

The OrthoMCL approach builds tighter gene clusters with

fewer false positives than the Tribe-MCL method [36]. How-

ever, the OrthoMCL clusters might not represent truly

distinct gene families, but could be subfamilies for some

more divergent gene families (gene family represents an

OrthoMCL cluster as a general term hereafter). For example,

the MADS-box gene family includes tens of OrthoMCL

clusters. Therefore, we have to emphasize the potential

over-estimation of ancestral gene content, as well as the

number of gene family gains and losses. The number of gen-

omes used in the gene family classification could also affect

the gene clusters, especially since most of the currently avail-

able genomes are angiosperms. However, this analysis does

illustrate a general trend of genome modifications following

land plant evolution. We modelled the changes of gene

content occurring along each branch using a Wagner

parsimony framework model.

From this gene family classification, we estimated a mini-

mum gene set of 7100 genes (figure 1, node 1) in the common

ancestor of all land plants, which is very close to the esti-

mation (6820) of Banks et al. [26]. Banks et al. [26] identified

gene families using assignment of orthology by mutual-

best-hit criteria and synteny between closely related organ-

isms. The bryophyte Physcomitrella has more than twice the

number of genes of aquatic Chlamydomonas (35 938 versus

15 143) [25,37], suggesting a general increase in gene family

complexity following the transition to land. Genes associated

with aquatic environments (e.g. flagellar components for

gametic motility) and dynein-mediated transport have been

lost in Physcomitrella [25]. By contrast, Physcomitrella gained

members of gene families associated with signal transduction

(e.g. through gibberellic acid, jasmonic acid, ethylene and

brassinosteroids), transport capabilities and tolerance of abio-

tic stresses. Adaptation to land required plants to be able to

survive much greater variations of temperature, light and

water availability. An example of such adaptation may be

the heat shock protein 70 (HSP70) gene family. All algal gen-

omes sequenced to date have only one cytosolic HSP70 gene

[38], while there are nine in P. patens [25]. Likewise, the

light-harvesting complex proteins have significantly expan-

ded in P. patens, perhaps contributing to robustness of the

photosynthetic antenna to deal with high light intensities.

Photo-protective early light-induced proteins also expanded

greatly in P. patens, putatively associated with avoidance of

photo-oxidative damage. Some of these expansions may be

results of whole-genome duplications [25].

The estimated minimal gene number of the most recent

common ancestor of vascular plants is 7790 (figure 1, node 2),

with the acquisition of 690 new genes during the transition

from non-vascular plants. In the lycophyte Selaginella, which,

among plants with sequenced genomes, is sister to all other vas-

cular plants, secondary metabolic genes (such as cytochrome

P450, BAHD acyltransferases and terpene synthases) were

expanded extensively [26]. It has been suggested that far

fewer new genes were needed for the transition from a gameto-

phyte- to a sporophyte-dominated life cycle than for the

transition from non-seed vascular plant to a seed plant and

subsequently to a flowering plant [26].

A large number of gene family gains (1269) were inferred

along the deep branch leading to seed plants, in which the

estimated ancestral gene content was 8652. Genome size is

exceptionally large for gymnosperms (18–35 gigabases),

although polyploidy is thought to be rare in this group [39].

Recent efforts have indicated that the large genome size of

gymnosperms might be associated with rapid expansion of

retrotransposons and may be limited to conifers, Pinaceae

[40–44], which are particularly well studied in view of their

economic importance. A recent study suggested that elevated

rates of genome size and diversification occurred within the

past 100 Myr, especially in Pinus [45]. A draft sequence of the

20-gigabase genome of Picea abies has been recently completed

[28]. Despite being more than 100 times larger than that of

Arabidopsis, the Picea abies genome contains a similar number

of well-supported annotated genes (28 354).

Comparison to Picea abies and/or other conifer genomes pro-

vides a valuable reference for studying genome modifications

and the evolution of key traits for seed plants, for example

the origin of flowering. The representative gymnosperm, Picea

gene families gain/loss+/–

polyploidy event(s)

71001

2

3

4

5

6

7

8

9

10

+690

+1269/–407

+1492/–192

+1068/–26

+623/–35

+213/–60

+434/–83

+1081/–783 Arabidopsis thaliana

Populus trichocarpa

Vitis vinifera

Solanum lycopersicum

Nelumbo nucifera

Oryza sativa

Sorghum bicolor

Musa acuminata

Picea abies

Selaginella moellendorffii

Physcomitrella patens

+2220/–320

+2425/–167

+1991/–198

+4126/–227

10 547

14 446

7790

+1995

+2164/–654

+7153/–426

+662/–67

+1682/–537

+2259/–280

+1643/–639

+2393/–161

8652

9952

10 994

11 582

11 753

12 086

Figure 1. Global gene family loss and gain in plant genomes. Eleven sequenced plant genomes were selected for phylogenetic representation of major plant groups. Genefamilies were identified using OrthoMCL, and the Count program was used to determine the minimum gene set for ancestral nodes of the phylogenetic tree using aWagner parsimony framework. Numbers after each node indicate the estimated ancestral gene numbers, while the numbers above each branch show the number of genefamily gains (þ) and losses ( – ). The extra bar under the internal branches means one or more rounds of ancient polyploidy suggested along that time period.

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

3

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

abies, lacks flowering locus T (FT)-like genes (that promote flow-

ering in other taxa), instead containing a group of FT/TFL1-like

genes which probably act as flowering repressors [28,46,47]. The

FT-like genes are exclusively found in angiosperms, including

early diverging and eudicot lineages, supporting the hypothesis

that the evolution of flowering plants coincided with the evol-

ution of a flower-promoting function for an FT/TFL1-like

gene. In plants, MADS-box genes encode transcription factors,

which are important regulators of plant development, particu-

larly as regulators of floral organ identity [48]. A total of 278

MADS-box homologies were identified in P. abies, most of

which are type II MADS-box genes (MIKC-type proteins, with

structure of MADS (M) domain followed by an Intervening (I),

a Keratin-like (K) and a C-terminal domain). The VASCULAR

NAC DOMAIN (VND) gene family controls the formation of

multi-cellular vessels, which has been considered to be one

of the key innovations that contribute to the success of the

flowering plants [49]. The VND gene family has been expanded

with only two VND genes found in P. abies [28] and seven in

Arabidopsis thaliana [50].

Another large wave of gene family gains (figure 1, node 4,

1492 new gene families) was inferred along the branch leading

to angiosperms, suggesting that a diverse set of novel gene

functions originated before the emergence of angiosperms.

Angiosperms have seeds contained within a fruit, unlike gym-

nosperms that have naked seeds (no fruit). The flowers, fruits

and other characters of angiosperms are likely to have contrib-

uted to their emergence as the most species-rich group of land

plants [51–53]. However, because this exemplar analysis does

not include genome sequences from basal angiosperms, it

cannot distinguish gene families originating before or soon

after the earliest diversification of angiosperms. Amborella is

thought to be the single living sister species to other extant

flowering plants, and the Amborella trichopoda genome, which

is finished but not yet available for large-scale analysis, will

provide a valuable reference for reconstructing ancestral

angiosperm gene content (http://www.amborella.org/) [54].

The estimated minimum gene sets for eudicots and

monocots are 10 994 and 10 547 (figure 1, nodes 5 and 9),

respectively. A total of 1068 novel gene families were gained

in eudicots as a whole (on the branch to node 5), and 662 of

these were gained before the diversification of monocots (on

the branch to node 9). However, the largest gain of gene

families in the internal branches of the studied 11-genome

phylogeny was inferred along the branch leading to grasses

(figure 1, node 10), with about 4126 gene families newly

gained before the divergence of Oryza and Sorghum.

4. Genome structure comparison and syntenydetection

Comparisons between and within genomes permit one to detect

homologous regions based on conserved gene content and

gene order (synteny blocks), in some cases also revealing that

large-scale, or whole-genome, duplications (WGDs) occurred

during evolutionary history. In vertebrates, synteny blocks are

still detectable even after hundreds of millions of years of diver-

gence [55,56]. However, angiosperm genomes are more

dynamic with much more rapid structural evolution, which is

due partly if not largely to extensive chromosomal rearrange-

ment and gene loss following WGDs in the ancestral lineages

[57–59]. For example, the two main clades of angiosperms,

monocots and eudicots, were predicted to have diverged

about 140–150 Ma [60]. Syntenic orthologue pairs between

monocot and eudicot species can only account for 1–7% of the

no. W

GD

s af

ter

dive

rgen

ce in

crea

sing

species divergence time increasing

Oryza Vitis

Musa

Mus

aZ

eaSo

rghu

m

Populus

Carica

Figure 2. Impact of divergence time and consecutive rounds of genome duplication on synteny signals. Syntenic blocks were identified using MCscanX [61]. Thecomparisons of Oryza-Sorghum and Vitis-Carica were used to show synteny when no genome duplication occurred after the divergence. One genome duplicationoccurred after the divergence of Oryza-Zea and Vitis-Populus respectively, and multiple WGDs occurred after the split of Oryza-Musa and Vitis-Musa. The signal ofsynteny was eroded following the increasing divergence time and additional number of WGDs after the separation. (Online version in colour.)

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

4

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

total orthologue pairs, versus 30–50% in intra-species compari-

son in eudicots or monocots (see table 1 in [61]). Inter-genome

synteny blocks can be substantially affected by the number of

genome duplications after species divergence, as well as the

timing of species divergence (figure 2).

Many algorithms and software have been developed to find

syntenic blocks within or among genomes. In general, hom-

ology matrices are built using all-against-all BLAST searches,

and then synteny is detected through clustering neighbouring

matches within a matrix, such as the methods implemented in

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

5

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

ADHoRe [62] and DiagHunter [63]. Another method to detect

synteny is to employ dynamic programming to find the highest

scoring ‘chains’ of syntenic genes, performing empirical or stat-

istical tests to determine whether the observed synteny could

have arisen by chance as implemented in DAGchainer [64],

ColinearScan [65], MCscan [58] and MCscanX [61]. In addition,

several web-based systems have been developed for whole-

genome analysis. For example, CoGe is a platform for multiple

whole-genome comparisons to find and align syntenic regions

and visualize the output in an intuitive and interactive manner

[66–68]. At the time of writing, it has implemented 15 140

organisms with 19 624 genomes, including Bacteria, Archaea,

eukaryotes, organelles, viruses and sub-genomes such as plas-

mids. The Plant Genome Duplication Database (http://chibba.

pgml.uga.edu/duplication) is a public web service to identify

synteny information among plant genomes and is mainly

focused on unravelling genome duplication events during

the history of angiosperm evolution [69].

While syntenic blocks are the most definitive evidence sup-

porting the most recent WGD in a genome, most angiosperm

genomes have experienced multiple WGDs. In order to find the

signatures of more ancient events, two approaches have been suc-

cessfully used to track the ancient WGDs, referred to as ‘bottom-

up’ [57] and ‘top-down’ approaches [58]. The ‘bottom-up’

approach is to ‘merge’ regions duplicated in the most recent

event affecting a genome, using genes that remain duplicated

and in syntenic locations to align corresponding regions and

infer the gene content and order of an ancestral chromosome

that would be sufficient to explain their gene content and

order. Deduced ancestral chromosomes are then compared to

one another in the same manner, to recursively reconstruct

older genome duplications in a stepwise fashion [57,70]. The

main challenge is how to order the non-anchor genes located in

the syntenic blocks. To date, only the second and the third iter-

ation have been achieved using the bottom-up approach

[57,70]. In the ‘top-down’ approach, pair-wise syntenic blocks

are first identified between the target genome and an outgroup

genome. Then, all homologous segments between the two gen-

omes are clustered together to infer the number of WGDs

based on the redundancy level, i.e. the number of homologous

segments each corresponding to a single region of the genome

[13]. For example, in the tiny Utricularia gibba genome (genome

size of 82 megabases), a top-down approach could identify at

least three rounds of WGD occurring after the divergence of U.gibba from acommon ancestor shared with tomato and grape [71].

5. Palaeopolyploidy and gene family numbersGenome doubling may confer a number of advantages to a

polyploid [6], via mechanisms such as increased gene

dosage, ‘intergenomic heterosis’ conferred by multiple alleles

in a polyploid nucleus, or the evolution of novel gene functions

(neofunctionalization, [4,5]). Each of these advantages requires

that each of the two duplicated copies of a gene survive in the

doubled genome, thus increasing gene family numbers. How-

ever, it is apparent from a host of genome sequences that by far

the most common fate of a pair of duplicated genes is silencing

(either epigenetically or by mutation) and eventual elimination

of one member of a pair, called ‘non-functionalization’. Classi-

cal ideas about neofunctionalization as a major advantage of

polyploidy [4,5] have more recently been tempered with the

suggestion that a more widespread fate of those duplicated

genes that survive may be the evolution of subdivisions of

ancestral functions (subfunctionalization, [72]) that render

them interdependent. Subfunctionalization may sometimes

lead to neofunctionalization [73].

In angiosperms (which are certainly representative of crown

eukaryotes, and probably of all organisms in this regard), we

find three ‘fates’ of individual gene pairs following duplication:

(1) Most gene functional groups show post-duplication gene

preservation/loss rates that are indistinguishable from

the genome-wide average. Such ‘neutral’ loss of dupli-

cated genes presumably involves inactivating mutations

opposed by very weak selection [74], closely resembling

the ‘non-functionalization’ described by others (e.g.

[75]) as the fate of the vast majority of duplicated genes.

(2) Genes in some specific functional categories tend to be

retained after duplication. Several gene functional groups

are preferentially preserved in duplicate [58,76–78].

Coding regions of genes preserved in duplicate tend to be

functionally complex [76], under purifying selection [75,76]

and evolve in concert [79,80]. Tendencies to retain duplicate

genes involved in signal transduction and transcription and

to lose DNA repair genes are widely observed [81,82]. How-

ever, Pfam domain-based groupings reveal heterogeneity in

the broader and more widely used gene ontology (GO) cat-

egories used in prior studies [81,83], for example, showing

one abundant protein–protein interaction domain (LRR,

leucine-rich repeat) to be usually preserved in duplicate

while two less-abundant domains (SET; TPR, tetratricopeptide

repeat) are usually restored to singleton states [78].

(3) Other specific genes and gene functional groups show more

extensive loss of duplicate copies than the genome-wide

average, and this loss is often convergent following indepen-

dent duplications separated by hundreds of millions of years

(i.e. in angiosperms, yeast and the fish Tetraodon [78]). Some

gene functional groups are preserved in duplicate signifi-

cantly less frequently than the genome-wide average. This

observation alone might be viewed as noise—among thou-

sands of functional groups, some must incur more gene

loss than others due to random factors. However, the gene

functional groups that have incurred the greatest loss of

duplicated copies are closely correlated following indepen-

dent duplications in Arabidopsis and rice, at statistical

probabilities that essentially rule out false positives [78].

Multi-alignments show some individual genes to have

been repeatedly restored to single-copy status following

many different genome duplications in independent angio-

sperm lineages [13,58]. The two gene sets do not overlap.

Repeated restoration of certain genes to singleton status at

a greater-than-random frequency suggests that an under-

lying set of principles of molecular evolution may

contribute to the fates of gene and genome duplications [78].

In the following pages, we elaborate in more detail on

these respective outcomes.

6. Palaeopolyploidy and gene family gainsGene duplication has long been thought to be a primary

source of material for evolution [4]. As discussed above, the

duplicates retained in genomes usually have divergent func-

tions. A now-classical concept is that a duplicate gene is

relatively free of selective constraints and can evolve new

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

6

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

functions (neofunctionalization), so long as the ancestral copy

maintains its original functions [4,84]. Thus, gene duplication

has been viewed as one of the main molecular mechanisms in

the creation of new genes [85]. Single gene duplicates often

lose part or all of the regulatory features of their progenitor

genes, thus being prone to evolving new expression patterns

and perhaps new functions [86]. However, single-gene dupli-

cates have relatively short half-lives [84], so evolution of a

new function that confers a selective advantage needs to

happen relatively quickly or the gene may be lost. Under

WGD, involving the doubling of the entire genome, dupli-

cated genes retain their ancestral regulatory features and

expression patterns at least initially. However, they generally

survive longer than single gene duplicates and thus have

more time to evolve novel function that justifies their retention,

in principle offering a rich potential source of new genes, gene

families and sub-families in addition to increasing the number

of members of existing gene families.

The rapidly expanding set of whole genome sequences

available for plant species has provided useful resources for

unravelling palaeopolyploidy events in evolutionary history

[13,33,57,83,87], as well as for reconstructing ancestral gene

content. We investigated gene family gains along deep

branches of the land plant phylogeny, and found that the lar-

gest number of gene family gains is on the branches where

polyploidy events occurred (figure 1). As mentioned above,

these gains could mean true new gene family origination, or

just expansion of extensively diverged subfamilies. Ancient

polyploidy events in the common ancestors of all extant seed

plants, and all extant angiosperms, respectively, were ident-

ified using a genome-scale phylogenomic approach [87].

These two ancient WGD events appear on the branches leading

to nodes 3 and 4 in figure 1, respectively, on which large num-

bers of gene family gains occurred (1269 and 1492). The origin

of the seed and flower may be partly explained by these novel

genes and expanded gene families that appeared before the

diversification of seed plants and angiosperms. For example,

35 floral genes in Arabidopsis were clustered in orthogroups

that originated before the angiosperm radiation (electronic

supplementary material, table S1). Many floral genes origi-

nated before seed plants and have been further expanded in

number by ancient polyploidy events, such as MADS-box

[28], PHYTOCHROME [88] and HD-ZIP III gene families

[89]. In total, we found 21 genes associated with flowers in

the orthogroups newly gained before the divergence of angios-

perms and gymnosperms (electronic supplementary material,

table S2). These results suggest that gene recruitment is an

ongoing process, while gene origination or expansion could

be a short/long time before the recruitment. Indeed, gene

recruitment to a function surely requires not only available

genes but also an external impetus, such as climate change.

For example, duplicate copies of virtually all genes that were

eventually recruited into the evolution of C4 photosynthesis

in Poaceae grasses were available following a pan-Poaceae

genome duplication at about 70 Ma [90]. However, many of

these copies were lost, and subsequent single-gene duplicates

of the same genes were recruited into C4 photosynthesis

about 40 Myr later [91].

The largest gene family gain is 4126, on the internal branch

that gave rise to Oryza and Sorghum. At least two (r and s), and

perhaps three, WGD events were identified on this branch

leading to node 10, which is specific to the Poaceae (grass)

clade, not shared with the Musa (grass) lineage [34,70]. By

examining syntenic evidence, the most recent WGD, r, was

identified as a shared event of Oryza and Sorghum [90]. A

bottom-up reconstruction approach was used to build pre-r

ancestral blocks. A second iteration of synteny analysis was

able to find a more ancient polyploidy event before r [70].

Genome comparisons revealed that generally three regions of

the eudicot Vitis (grape) genome matched up to eight homolo-

gous regions in Oryza, suggesting that three WGD events

occurred in the Oryza lineage after the split with eudicots [70].

Another ancient polyploidy event on the internal

branches, named gamma, is most likely shared with all core

eudicots [31,92–94]. There are 623 novel gene family gains

along the branch leading to core eudicots (node 6). However,

more gene family gains (1068) appear on the branch leading

to all eudicots (node 5), where no genome duplication has

been identified. This is somewhat in conflict with the other

large gains, which usually accompanied one or more WGD

events. The inconsistency could be due to the timing and

nature of the gamma event, which may actually be two

events that occurred in close proximity, very early in the

evolutionary history of eudicots [93].

Changes in gene family content along terminal branches

of the land plant phylogeny could be partly due to further

independent polyploidy events in the evolutionary history

of many lineages. At least one more polyploidy event was

identified along the branches leading to all eudicot lineages

depicted except Vitis [30,58,93,95]. Three independent poly-

ploidy events were proposed on the terminal branch to

Musa acuminata [34]. Numbers of gains inferred along the

terminal branches could also be somewhat inflated by

genome annotation errors.

Polyploidy events appear likely to be the major factor gen-

erating novel genes, as well as expanding gene families. For

angiosperm branches on which no ancient polyploidy event

was identified (branches leading to nodes 2, 7, 8 and 9), rela-

tively small numbers of novel gene families were gained

(690, 213, 434 and 662, respectively). Small-scale duplications,

such as transposition and tandem duplications, might contrib-

ute these gene family gains. Indeed, while polyploidy may

confer the largest number of new genes, small-scale dupli-

cations that separate genes from ancestral regulatory features

and place them in new genomic environments, are more

prone to the evolution of novel expression patterns and may

be more likely than polyploidy-derived genes to give rise to

novel functions [96].

It has been widely acknowledged that gene retention after

polyploidy events is biased [97–100]. Genes in functional

categories of transcription factors, protein kinases and trans-

ferases, tend to retain duplicated copies following WGD

[81,97]. Several models have been proposed to explain the

biased retention pattern following duplication. Observations

in Saccharomyces cerevisiae and Caenorhabditis elegans [101]

suggest that genes for which duplicates are preserved tend

to be slowly evolving, whereas genes restored to singletons

tend to have a higher mutation rate. Another idea relates

gene retention probability to gene expression level. In yeast,

it has been observed that selection for increased levels of

gene expression was a significant factor determining which

genes were retained and which were returned to a single-

copy state [102]. In addition, the ‘gene balance’ hypothesis

[97] predicts a relationship between the number of inter-

actions, or ‘connectivity’ of a gene (i.e. with other genes),

and its sensitivity to altered gene dosage. ‘Connected’

a1

b1

c1

a2

A

B

B

C

A

A

C

B

C

A

B

C

b2

c2

a1

b1

c1

a2

b2

c2

a1

b1

c1

a2

b2

c2

a1

b1

c1

a2

b2

c2

(correct)

(wrong)

(wrong)

(correct)

(b)

(a)

(c)

(d )

Figure 3. Different gene loss patterns affect the reconstructed phylogeny using single-copy genes. If the single-copy gene was duplicated in the common ancestorof species A, B and C, different gene silencing timing and patterns will impact the final reconstructed evolutionary relationships. The most hypothesized case is thatthe duplicates in the ancestor are quickly restored to single-copy status. However, several single-copy genes in soya bean have retained the duplicated copy for about13 Myr. Many speciation events could occur during such a long time. Here, we proposed four different gene loss patterns, and all could restore single-copy genestatus. (a) All three genes in one big clade went through gene deletion, and the remaining three genes can be used to reconstruct the phylogeny correctly. However,if gene loss patterns were as in (b) and (c), the constructed phylogeny will not truly reflect the relationships among species. (d ) Although gene c 2 is not theorthologue of a1 and b1, the final tree is correct by chance. (Online version in colour.)

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

7

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

genes are more likely to be retained after WGD than after

tandem duplications, with losses of such genes resulting in

a sort of haploinsufficiency relative to the remainder of the

genome. It has also been argued that balanced gene drive

should tend to drive up morphological complexity [7],

which is potentiated by duplicated functional models (gene

rstb.roya

8

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

networks). An important need for further investigation

of many of these ideas is a much greater knowledge of

the various forms of interactions among genes, including

physical, regulatory and feedback-mediated.

lsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

7. Palaeopolyploidy and gene lossesPolyploidy has made a significant contribution to gene family

expansion and novel gene content as discussed above; how-

ever, non-random patterns of loss of duplicated genes are less

studied. As noted above, the vast majority of gene duplicates

tend to be silenced in a relatively short period of time, and

this process is not random [81,103,104]. Specific genes and genefunctional groups show more extensive loss of duplicate copies thanthe genome-wide average, and this loss is often convergent followingindependent duplications separated by hundreds of millions of years[100,105,106]. Investigations of single-copy genes in sequenced

flowering plants has statistically ruled out the possibility of

random gene loss (e.g. [58,106]). Single-copy genes are overre-

presented in some essential functional categories, such as DNA

repair, recombination, enzyme activity and organelle-related

functions, and underrepresented in GO categories in which

WGD survivors are usually significantly enriched (e.g. tran-

scription factor activity, kinase activity, transport, signal

transduction) [105,106]. Furthermore, single-copy genes show

more sequence conservation, higher gene expression level,

and expression in more tissues than multiple-copy genes

[106]. Conserved nuclear single-copy genes have been used

as markers for reconstructing angiosperm phylogeny, even

eukaryotic relationships, and for improving resolution of the

evolutionary history of organisms [107,108]. However, it

might be problematic to use some specific single-copy gene

families as phylogenetic markers if speciation precedes their

restoration to single-copy status. For example, two recent

studies [106,107] each show that genes that are single-copy in

other organisms are still duplicated in soya bean after its

most recent polyploidization event at approximately 13 Ma,

and tetraploid cotton appears likewise to experience relatively

slow gene loss [109]. If a speciation event occurred before

duplicated genes experienced reciprocal gene silencing, a

reconstructed phylogeny may not reflect the true evolutionary

relationship (figure 3).

It has been suggested that reciprocal gene loss after

polyploidy could contribute directly to speciation and repro-

ductive isolation [17,90,110–112]. By investigating gene losses

following a WGD in a common ancestor of three yeast species

(S. cerevisiae, Saccharomyces castellii and Candida glabrata), it has

been shown that 20% of the loci experienced differential gene

loss patterns [17]. The speciations were shortly after the WGD

event, during a period of precipitous gene loss. Therefore, it is

hypothesized that reciprocal gene loss at many ancestrally

duplicated genes could be the main factor leading to speciation

[17]. However, two recent genome-wide studies of syntenic

gene losses in Poaceae found very little evidence supporting

reciprocal loss of homologous genes among the grass species

[33,113]. While evidence from additional plant groups needs

to be explored, substantially different effective population

sizes in microbes (very large) and crown eukaryotes (very

small) may contribute to differences in the fates of duplicated

genes [19,20] and associated evolutionary patterns.

8. Conclusion and future studiesOne of the great challenges of evolutionary genomics is linking

genome modifications that are evident in the burgeoning sets

of angiosperm (and other) genome sequences now available

to speciation, diversification and the morphological and/or

physiological innovations that collectively constitute biodiver-

sity. Polyploidy events are prevalent throughout angiosperm

evolution. While the timing of many ancient WGD events

can be circumscribed to specific branches of angiosperm phy-

logeny, much better resolution is needed to link genomic

changes to their biological consequences. Soon, genome

sequences will be available for virtually all branches of land

plant phylogeny, which will help to improve resolution.

Enriched knowledge of botanical diversity may permit us to

circumscribe not merely thousands or even hundreds of impor-

tant functional changes to a branch, but much smaller

numbers, approaching a more ‘particulate’ model for evol-

utionary history in much the same manner that Mendelian

genetics transitioned science away from ‘blending’ models of

inheritance. Dosage balance could explain some biases in

gene retention after different modes of duplication. Detailed

analysis of evolutionary history of all members of functional

protein complexes might provide stronger support for this

hypothesis. More effort is needed in reconstructing, decoding

and analysing the inferred ancestral genomes prior to diversi-

fication of major angiosperm groups. With reconstructed

ancestral genomes, we could thoroughly assess genome modi-

fications such as gene gain and loss through evolution, and

associate such changes with diversification and novel inno-

vation and function. In addition, genome resequencing data

could provide more insights into genome variation following

evolution and adaptation at the population level. Genome

modifications, including duplication, fractionation, rearrange-

ments and changes in expression, have the potential to

facilitate the origin of new functional variation in organisms,

including economically important traits in the crop plants

that sustain humanity.

References

1. Galitski T, Saldanha AJ, Styles CA, Lander ES, FinkGR. 1999 Ploidy regulation of gene expression.Science 285, 251 – 254. (doi:10.1126/science.285.5425.251)

2. Hughes T et al. 2000 Widespread aneuploidyrevealed by DNA microarray expressionprofiling. Nat. Genet. 25, 333 – 337. (doi:10.1038/77116)

3. Arrigo N, Barker MS. 2012 Rarely successful polyploidsand their legacy in plant genomes. Curr. Opin. PlantBiol. 15, 140 – 146. (doi:10.1016/j.pbi.2012.03.010)

4. Ohno S. 1970 Evolution by gene duplication. Berlin,Germany: Springer.

5. Stephens S. 1951 Possible significance ofduplications in evolution. Adv. Genet. 4, 247 – 265.(doi:10.1016/S0065-2660(08)60237-0)

6. Comai L. 2005 The advantages and disadvantagesof being polyploid. Nat. Rev. Genet. 6, 836 – 846.(doi:10.1038/nrg1711)

7. Freeling M, Thomas BC. 2006 Gene-balancedduplications, like tetraploidy, provide predictabledrive to increase morphological complexity.Genome Res. 16, 805 – 814. (doi:10.1101/gr.3681406)

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

9

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

8. Muntzing A. 1936 The evolutionary significance ofautopolyploidy. Hereditas 21, 363 – 378. (doi:10.1111/j.1601-5223.1936.tb03204.x)

9. Love A, Love D. 1949 The geobotanical significanceof polyploidy. Portugaliae Acta (Suppl), 273 – 352.

10. Stebbins GL. 1950 Variation and evolution in plants.New York, NY: Columbia University Press.

11. Grant V. 1971 Plant speciation, 1st edn. New York,NY: Columbia University Press.

12. Stebbins G. 1966 Chromosomal variation andevolution; polyploidy and chromosome size andnumber shed light on evolutionary processes inhigher plants. Science 152, 1463 – 1469. (doi:10.1126/science.152.3728.1463)

13. Tang H, Bowers JE, Wang X, Ming R, Alam M,Paterson AH. 2008 Synteny and colinearity in plantgenomes. Science 320, 486 – 488. (doi:10.1126/science.1153917)

14. Paterson AH, Freeling M, Tang H, Wang X. 2010Insights from the comparison of plant genomesequences. Annu. Rev. Plant Biol. 61, 349 – 372.(doi:10.1146/annurev-arplant-042809-112235)

15. Gu ZL, Steinmetz LM, Gu X, Scharfe C, Davis RW,Li WH. 2003 Role of duplicate genes in geneticrobustness against null mutations. Nature 421,63 – 66. (doi:10.1038/nature01198)

16. Christoffels A, Koh EGL, Chia JM, Brenner S, AparicioS, Venkatesh B. 2004 Fugu genome analysisprovides evidence for a whole-genome duplicationearly during the evolution of ray-finned fishes. Mol.Biol. Evol. 21, 1146 – 1151. (doi:10.1093/molbev/msh114)

17. Scannell DR, Byrne KP, Gordon JL, Wong S, WolfeKH. 2006 Multiple rounds of speciation associatedwith reciprocal gene loss in polyploid yeasts. Nature440, 341 – 345. (doi:10.1038/nature04562)

18. Aury JM et al. 2006 Global trends of whole-genomeduplications revealed by the ciliate Parameciumtetraurelia. Nature 444, 171 – 178. (doi:10.1038/nature05230)

19. Lynch M, O’Hely M, Walsh B, Force A. 2001 Theprobability of preservation of a newly arisen geneduplicate. Genetics 159, 1789 – 1804.

20. Lynch M. 2006 The origins of eukaryotic genestructure. Mol. Biol. Evol. 23, 450 – 468. (doi:10.1093/molbev/msj050)

21. Kenrick P, Crane PR. 1997 The origin and earlyevolution of plants on land. Nature 389, 33 – 39.(doi:10.1038/37918)

22. Becker B, Marin B. 2009 Streptophyte algae and theorigin of embryophytes. Ann. Bot. 103, 999 – 1004.(doi:10.1093/aob/mcp044)

23. Karol KG, McCourt RM, Cimino MT, Delwiche CF.2001 The closest living relatives of land plants.Science 294, 2351 – 2353. (doi:10.1126/science.1065156)

24. Lewis LA, McCourt RM. 2004 Green algae and theorigin of land plants. Am. J. Bot. 91, 1535 – 1556.(doi:10.3732/ajb.91.10.1535)

25. Rensing SA et al. 2008 The Physcomitrella genomereveals evolutionary insights into the conquest ofland by plants. Science 319, 64 – 69. (doi:10.1126/science.1150646)

26. Banks JA et al. 2011 The Selaginella genomeidentifies genetic changes associated with theevolution of vascular plants. Science 332, 960 – 963.(doi:10.1126/science.1203810)

27. Murray B, Leitch I, Bennett M. 2012 GymnospermDNA C-values database (release 5.0, Dec. 2012). Seehttp://www.kew.org/cvalues/

28. Nystedt B et al. 2013 The Norway spruce genomesequence and conifer genome evolution. Nature497, 579 – 584. (doi:10.1038/nature12211)

29. Arabidopsis Genome Initiative. 2000 Analysis of thegenome sequence of the flowering plantArabidopsis thaliana. Nature 408, 796 – 815.(doi:10.1038/35048692)

30. Tuskan GA et al. 2006 The genome of blackcottonwood, Populus trichocarpa (Torr. & Gray).Science 313, 1596 – 1604. (doi:10.1126/science.1128691)

31. Jaillon O et al. 2007 The grapevine genomesequence suggests ancestral hexaploidization inmajor angiosperm phyla. Nature 449, 463 – 467.(doi:10.1038/nature06148

32. Yu J et al. 2002 A draft sequence of the ricegenome (Oryza sativa L. ssp. indica). Science 296,79 – 92. (doi:10.1126/science.1068037)

33. Paterson AH et al. 2009 The Sorghum bicolorgenome and the diversification of grasses. Nature457, 551 – 556. (doi:10.1038/nature07723

34. D’Hont A et al. 2012 The banana (Musa acuminata)genome and the evolution of monocotyledonousplants. Nature 488, 213 – 217. (doi:10.1038/nature11241

35. Li L, Stoeckert Jr CJ, Roos DS. 2003 OrthoMCL:identification of ortholog groups for eukaryoticgenomes. Genome Res. 13, 2178 – 2189. (doi:10.1101/gr.1224503)

36. Chen F, Mackey AJ, Vermunt JK, Roos DS. 2007Assessing performance of orthology detectionstrategies applied to eukaryotic genomes. PLoS ONE2, e383. (doi:10.1371/journal.pone.0000383)

37. Merchant SS et al. 2007 The Chlamydomonasgenome reveals the evolution of key animal andplant functions. Science 318, 245 – 250. (doi:10.1126/science.1143609)

38. Wang W, Vinocur B, Shoseyov O, Altman A. 2004Role of plant heat-shock proteins and molecularchaperones in the abiotic stress response. TrendsPlant Sci. 9, 244 – 252. (doi:10.1016/j.tplants.2004.03.006)

39. Delevoryas T. 1979 Polyploidy in gymnosperms.Basic Life Sci. 13, 215 – 218.

40. Morse AM et al. 2009 Evolution of genome size andcomplexity in Pinus. PLoS ONE 4, e4332. (doi:10.1371/journal.pone.0004332)

41. Kovach A et al. 2010 The Pinus taeda genome ischaracterized by diverse and highly divergedrepetitive sequences. BMC Genomics 11, 420.(doi:10.1186/1471-2164-11-420)

42. Hall SE, Dvorak WS, Johnston JS, Price HJ, WilliamsCG. 2000 Flow cytometric analysis of DNA contentfor tropical and temperate New World pines. Ann.Bot. Lond. 86, 1081 – 1086. (doi:10.1006/anbo.2000.1272)

43. Wakamiya I, Newton RJ, Johnston JS, Price HJ. 1993Genome size and environmental factors in thegenus Pinus. Am. J. Bot. 80, 1235 – 1241.(doi:10.2307/2445706)

44. Grotkopp E, Rejmanek M, Sanderson MJ, Rost TL.2004 Evolution of genome size in pines (Pinus) andits life-history correlates: Supertree analyses.Evolution 58, 1705 – 1729. (doi:10.1111/j.0014-3820.2004.tb00456.x)

45. Burleigh JG, Barbazuk WB, Davis JM, Morse AM,Soltis PS. 2012 Exploring diversification and genomesize evolution in extant gymnosperms throughphylogenetic synthesis. J. Bot. 2012, 1 – 6. (doi:10.1155/2012/292857)

46. Klintenas M, Pin PA, Benlloch R, Ingvarsson PK,Nilsson O. 2012 Analysis of conifer Flowering LocusT/Terminal Flower1-like genes provides evidence fordramatic biochemical evolution in the angiospermFT lineage. New Phytol. 196, 1260 – 1273. (doi:10.1111/j.1469-8137.2012.04332.x)

47. Karlgren A, Gyllenstrand N, Kallman T, SundstromJF, Moore D, Lascoux M, Lagercrantz U. 2011Evolution of the PEBP gene family in plants:functional diversification in seed plant evolution.Plant Physiol. 156, 1967 – 1977. (doi:10.1104/pp.111.176206)

48. Ma H, dePamphilis C. 2000 The ABCs of floralevolution. Cell 101, 5 – 8. (doi:10.1016/S0092-8674(00)80618-2)

49. Sperry JS, Hacke UG, Pittermann J. 2006 Size andfunction in conifer tracheids and angiospermvessels. Am. J. Bot. 93, 1490 – 1500. (doi:10.3732/ajb.93.10.1490)

50. Kubo M, Udagawa M, Nishikubo N, Horiguchi G,Yamaguchi M, Ito J, Mimura T, Fukuda H, Demura T.2005 Transcription switches for protoxylem andmetaxylem vessel formation. Genes Dev. 19,1855 – 1860. (doi:10.1101/gad.1331305)

51. Regal PJ. 1977 Ecology and evolution of floweringplant dominance. Science 196, 622 – 629. (doi:10.1126/science.196.4290.622)

52. Crane PR, Friis EM, Pedersen KR. 1995 The originand early diversification of angiosperms. Nature374, 27 – 33. (doi:10.1038/374027a0)

53. Soltis PS, Soltis DE. 2004 The origin anddiversification of angiosperms. Am. J. Bot. 91,1614 – 1626. (doi:10.3732/ajb.91.10.1614)

54. Amborella Genome Project. 2013 The Amborellagenome and the evolution of flowering plants.Science 342, 1241089. (doi:10.1126/science.1241089)

55. Smith SF, Snell P, Gruetzner F, Bench AJ, Haaf T,Metcalfe JA, Green AR, Elgar G. 2002 Analyses ofthe extent of shared synteny and conserved geneorders between the genome of Fugu rubripes andhuman 20q. Genome Res. 12, 776 – 784. (doi:10.1101/gr.221802)

56. Dehal P, Boore JL. 2005 Two rounds of wholegenome duplication in the ancestral vertebrate.PLoS Biol. 3, e314. (doi:10.1371/journal.pbio.0030314)

57. Bowers JE, Chapman BA, Rong J, Paterson AH. 2003Unravelling angiosperm genome evolution by

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

10

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

phylogenetic analysis of chromosomal duplicationevents. Nature 422, 433 – 438. (doi:10.1038/nature01521)

58. Tang H, Wang X, Bowers JE, Ming R, Alam M,Paterson AH. 2008 Unraveling ancient hexaploidythrough multiply-aligned angiosperm gene maps.Genome Res. 18, 1944 – 1954. (doi:10.1101/gr.080978.108)

59. Coghlan A, Eichler EE, Oliver SG, Paterson AH, SteinL. 2005 Chromosome evolution in eukaryotes: amulti-kingdom perspective. Trends Genet. 21,673 – 682. (doi:10.1016/j.tig.2005.09.009)

60. Chaw SM, Chang CC, Chen HL, Li WH. 2004 Datingthe monocot-dicot divergence and the origin of coreeudicots using whole chloroplast genomes. J. Mol.Evol. 58, 424 – 441. (doi:10.1007/s00239-003-2564-9)

61. Wang Y et al. 2012 MCScanX: a toolkit for detectionand evolutionary analysis of gene synteny andcollinearity. Nucleic Acids Res. 40, e49. (doi:10.1093/nar/gkr1293)

62. Vandepoele K, Saeys Y, Simillion C, Raes J,Van de Peer Y. 2002 The automatic detection ofhomologous regions (ADHoRe) and its application tomicrocolinearity between Arabidopsis and rice.Genome Res. 12, 1792 – 1801. (doi:10.1101/gr.400202)

63. Cannon SB, Kozik A, Chan B, Michelmore R, YoungND. 2003 DiagHunter and GenoPix2D: programs forgenomic comparisons, large-scale homologydiscovery and visualization. Genome Biol. 4, R68.doi:10.1186/gb-2003-4-10)

64. Haas BJ, Delcher AL, Wortman JR, Salzberg SL. 2004DAGchainer: a tool for mining segmental genomeduplications and synteny. Bioinformatics 20,3643 – 3646. (doi:10.1093/bioinformatics/bth397)

65. Wang XY, Shi XL, Li Z, Zhu QH, Kong L, Tang W,Ge S, Luo JC. 2006 Statistical inference ofchromosomal homology based on gene colinearityand applications to Arabidopsis and rice. BMCBioinform. 7, 447. (doi:10.1186/1471-2105-7-447)

66. Lyons E et al. 2008 Finding and comparing syntenicregions among Arabidopsis and the outgroupspapaya, poplar, and grape: CoGe with rosids.Plant Physiol. 148, 1772 – 1781. (doi:10.1104/pp.108.124867)

67. Lyons E, Pedersen B, Kane J, Freeling M. 2008 Thevalue of nonmodel genomes and an example usingsynmap within CoGe to dissect the hexaploidy thatpredates the rosids. Trop. Plant Biol. 1, 181 – 190.(doi:10.1007/s12042-008-9017-y)

68. Lyons E, Freeling M, Kustu S, Inwood W. 2011 Usinggenomic sequencing for classical genetics in E. coliK12. PLoS ONE 6, e16717. (doi:10.1371/journal.pone.0016717)

69. Lee TH, Tang H, Wang X, Paterson AH. 2013 PGDD:a database of gene and genome duplication inplants. Nucleic Acids Res. 41, D1152 – D1158.(doi:10.1093/nar/gks1104)

70. Tang H, Bowers JE, Wang X, Paterson AH. 2009Angiosperm genome comparisons reveal earlypolyploidy in the monocot lineage. Proc. Natl Acad.

Sci. USA 107, 472 – 477. (doi:10.1073/pnas.0908007107)

71. Ibarra-Laclette E et al. 2013 Architecture andevolution of a minute plant genome. Nature 498,94 – 98. (doi:10.1038/nature12132)

72. Lynch M, Force A. 2000 The probability of duplicategene preservation by subfunctionalization. Genetics154, 459 – 473.

73. He XL, Zhang JZ. 2005 Rapid subfunctionalizationaccompanied by prolonged and substantialneofunctionalization in duplicate gene evolution.Genetics 169, 1157 – 1164. (doi:10.1534/genetics.104.037051)

74. Haldane JBS. 1933 The part played by recurrentmutation in evolution. Am. Nat. 67, 5 – 19. (doi:10.1086/280465)

75. Brunet FG, Crollius HR, Paris M, Aury JM, Gibert P,Jaillon O, Laudet V, Robinson-Rechavi M. 2006 Geneloss and evolutionary rates following whole genomeduplication in teleost fishes. Mol. Biol. Evol. 23,1808 – 1816. (doi:10.1093/molbev/msl049)

76. Chapman BA, Bowers JE, Feltus FA, Paterson AH.2006 Buffering crucial functions by paleologousduplicated genes may impart cyclicality toangiosperm genome duplication. Proc. Natl Acad.Sci. USA 103, 2730 – 2735. (doi:10.1073/pnas.0507782103)

77. Seoighe C, Gehring C. 2004 Genome duplication ledto highly selective expansion of the Arabidopsisthaliana proteome. Trends Genet. 20, 461 – 464.(doi:10.1016/j.tig.2004.07.008)

78. Paterson AH, Chapman BA, Kissinger J, Bowers JE,Feltus FA, Estill J, Marler BS. 2006 Convergentretention or loss of gene/domain families followingindependent whole-genome duplication events inArabidopsis, Oryza, Saccharomyces, and Tetraodon.Trends Genet. 22, 597 – 602. (doi:10.1016/j.tig.2006.09.003)

79. Gao LZ, Innan H. 2004 Very low gene duplicationrate in the yeast genome. Science 306, 1367 – 1370.(doi:10.1126/science.1102033)

80. Wang X, Tang H, Bowers JE, Feltus FA, Paterson AH.2007 Extensive concerted evolution of riceparalogs and the road to regaining independence.Genetics 177, 1753 – 1763. (doi:10.1534/genetics.107.073197)

81. Maere S, De Bodt S, Raes J, Casneuf T, Van MontaguM, Kuiper M, Van de Peer Y. 2005 Modeling geneand genome duplications in eukaryotes. Proc. NatlAcad. Sci. USA 102, 5454 – 5459. (doi:10.1073/pnas.0501102102)

82. Blanc G, Wolfe KH. 2004 Functional divergence ofduplicated genes formed by polyploidy duringArabidopsis evolution. Plant Cell 16, 1679 – 1691.(doi:10.1105/tpc.021410)

83. Blanc G, Wolfe KH. 2004 Widespreadpaleopolyploidy in model plant species inferredfrom age distributions of duplicate genes. Plant Cell16, 1667 – 1678. (doi:10.1105/tpc.021345)

84. Lynch M, Conery JS. 2000 The evolutionary fate andconsequences of duplicate genes. Science 290,1151 – 1155. (doi:10.1126/science.290.5494.1151)

85. Long M, Betran E, Thornton K, Wang W. 2003 Theorigin of new genes: glimpses from the young andold. Nat. Rev. Genet. 4, 865 – 875. (doi:10.1038/nrg1204)

86. Wang YP, Wang XY, Tang HB, Tan X, Ficklin SP,Feltus FA, Paterson AH. 2011 Modes of geneduplication contribute differently to genetic noveltyand redundancy, but show parallels across divergentangiosperms. PLoS ONE 6, e28150. (doi:10.1371/journal.pone.0028150)

87. Jiao Y et al. 2011 Ancestral polyploidy in seedplants and angiosperms. Nature 473, U97 – U113.(doi:10.1038/nature09916)

88. Mathews S, Burleigh JG, Donoghue MJ. 2003Adaptive evolution in the photosensory domain ofphytochrome A in early angiosperms. Mol. Biol. Evol.20, 1087 – 1097. (doi:10.1093/molbev/msg123)

89. Prigge MJ, Clark SE. 2006 Evolution of the class IIIHD-Zip gene family in land plants. Evol. Dev. 8,350 – 361. (doi:10.1111/j.1525-142X.2006.00107.x)

90. Paterson AH, Bowers JE, Chapman BA. 2004 Ancientpolyploidization predating divergence of the cereals,and its consequences for comparative genomics.Proc. Natl Acad. Sci. USA 101, 9903 – 9908. (doi:10.1073/pnas.0307901101)

91. Wang X, Gowik U, Tang H, Bowers JE, Westhoff P,Paterson AH. 2009 Comparative genomic analysis ofC4 photosynthetic pathway evolution in grasses.Genome Biol. 10, R68. (doi:10.1186/gb-2009-10-6-r68)

92. Vekemans D, Proost S, Vanneste K, Coenen H,Viaene T, Ruelens P, Maere S, Van de Peer Y, GeutenK. 2012 Gamma paleohexaploidy in the stemlineage of core eudicots: significance for MADS-boxgene and species diversification. Mol. Biol. Evol. 29,3793 – 3806. (doi:10.1093/molbev/mss183)

93. Ming R et al. 2013 Genome of the long-livingsacred lotus (Nelumbo nucifera Gaertn.). GenomeBiol. 14, R41. (doi:10.1186/gb-2013-14-5-r41)

94. Jiao Y et al. 2012 A genome triplication associatedwith early diversification of the core eudicots.Genome Biol. 13, R3. (doi:10.1186/gb-2012-13-1-r3)

95. Sato S et al. 2012 The tomato genome sequenceprovides insights into fleshy fruit evolution. Nature485, 635 – 641. (doi:10.1038/nature11119)

96. Yupeng W, Ficklin SP, Wang X, Alex Feltus F,Paterson AH. Submitted. Dispersed duplicated genesand the evolution of genetic complexity.

97. Freeling M. 2009 Bias in plant gene contentfollowing different sorts of duplication: tandem,whole-genome, segmental, or by transposition.Annu. Rev. Plant Biol. 60, 433 – 453. (doi:10.1146/annurev.arplant.043008.092122)

98. Kassahn KS, Dang VT, Wilkins SJ, Perkins AC, RaganMA. 2009 Evolution of gene function and regulatorycontrol after whole-genome duplication:comparative analyses in vertebrates. Genome Res.19, 1404 – 1418. (doi:10.1101/gr.086827.108)

99. Edger PP, Pires JC. 2009 Gene and genomeduplications: the impact of dosage-sensitivity on thefate of nuclear genes. Chromosome Res. 17,699 – 717. (doi:10.1007/s10577-009-9055-9)

rstb.royalsocietypublishing.orgPhil.Trans.R.Soc.B

369:20130355

11

on May 19, 2018http://rstb.royalsocietypublishing.org/Downloaded from

100. Paterson AH, Chapman BA, Kissinger JC, Bowers JE,Feltus FA, Estill JC. 2006 Many gene and domainfamilies have convergent fates followingindependent whole-genome duplication events inArabidopsis, Oryza, Saccharomyces and Tetraodon.Trends Genet. 22, 597 – 602. (doi:10.1016/j.tig.2006.09.003)

101. Davis JC, Petrov DA. 2004 Preferential duplication ofconserved proteins in eukaryotic genomes. PLoSBiol. 2, E55. (doi:10.1371/journal.pbio.0020055)

102. Seoighe C, Wolfe KH. 1999 Yeast genome evolutionin the post-genome era. Curr. Opin. Microbiol. 2,548 – 554. (doi:10.1016/S1369-5274(99)00015-6)

103. Makino T, McLysaght A. 2012 Positionally biasedgene loss after whole genome duplication: evidencefrom human, yeast, and plant. Genome Res. 22,2427 – 2435. (doi:10.1101/gr.131953.111)

104. Woodhouse MR, Schnable JC, Pedersen BS, Lyons E,Lisch D, Subramaniam S, Freeling M. 2010Following tetraploidy in maize, a short deletionmechanism removed genes preferentially from oneof the two homologs. PLoS Biol. 8, e1000409.(doi:10.1371/journal.pbio.1000409)

105. Duarte JM, Wall PK, Edger PP, Landherr LL, Ma H,Pires JC, Leebens-Mack J, dePamphilis CW. 2010Identification of shared single copy nuclear genes inArabidopsis, Populus, Vitis and Oryza and theirphylogenetic utility across various taxonomiclevels. BMC Evol. Biol. 10, 61. (doi:10.1186/1471-2148-10-61)

106. De Smet R, Adams KL, Vandepoele K, van MontaguMCE, Maere S, Van de Peer Y. 2013 Convergentgene loss following gene and genome duplicationscreates single-copy families in flowering plants.Proc. Natl Acad. Sci. USA 110, 2898 – 2903. (doi:10.1073/pnas.1300127110)

107. Zhang N, Zeng LP, Shan HY, Ma H. 2012 Highlyconserved low-copy nuclear genes as effectivemarkers for phylogenetic analyses in angiosperms.New Phytol. 195, 923 – 937. (doi:10.1111/j.1469-8137.2012.04212.x)

108. Wu FN, Mueller LA, Crouzillat D, Petiard V, TanksleySD. 2006 Combining bioinformatics andphylogenetics to identify large sets of single-copyorthologous genes (COSII) for comparative,evolutionary and systematic studies: a test case

in the euasterid plant clade. Genetics 174,1407 – 1420. (doi:10.1534/genetics.106.062455)

109. Rong J, Feltus FA, Liu L, Lin L, Paterson AH. 2010Gene copy number evolution during tetraploidcotton radiation. Heredity 105, 463 – 472. (doi:10.1038/hdy.2009.192)

110. Werth CR, Windham MD. 1991 A model fordivergent, allopatric speciation of polyploidpteridophytes resulting from silencing of duplicate-gene expression. Am. Nat. 137, 515 – 526. (doi:10.1086/285180)

111. Lynch M, Force AG. 2000 The origin of interspecificgenomic incompatibility via gene duplication. Am.Nat. 156, 590 – 605. (doi:10.1086/316992)

112. Maclean CJ, Greig D. 2011 Reciprocal gene lossfollowing experimental whole-genome duplicationcauses reproductive isolation in yeast. Evolution65, 932 – 945. (doi:10.1111/j.1558-5646.2010.01171.x)

113. Schnable JC, Freeling M, Lyons E. 2012 Genome-wide analysis of syntenic gene deletion in thegrasses. Genome Biol. Evol. 4, 265 – 277. (doi:10.1093/gbe/evs009)