molecular evolution of vef-domain-containing genes in plants · identifying genes that act in...
TRANSCRIPT
Molecular Plant • Volume 2 • Number 4 • Pages 738–754 • July 2009 RESEARCH ARTICLE
Molecular Evolution of VEF-Domain-ContainingPcG Genes in Plants
Ling-Jing Chen, Zhao-Yan Diao, Chelsea Specht and Z. Renee Sung1
Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720–3102, USA
ABSTRACT Arabidopsis VERNALIZATION2 (VRN2), EMBRYONIC FLOWER2 (EMF2), and FERTILIZATION-INDEPENDENT
SEED2 (FIS2) are involved in vernalization-mediated flowering, vegetative development, and seed development, respec-
tively. Together with Arabidopsis VEF-L36, they share a VEF domain that is conserved in plants and animals. To investigate
the evolution of VEF-domain-containing genes (VEF genes), we analyzed sequences related to VEF genes across land
plants. To date, 24 full-length sequences from 11 angiosperm families and 54 partial sequences from another nine families
were identified. The majority of the full-length sequences identified share greatest sequence similarity with and possess
the samemajor domain structure asArabidopsis EMF2. EMF2-like sequences are not onlywidespread among angiosperms,
but are also found in genomic sequences of gymnosperms, lycophyte, and moss. No FIS2- or VEF-L36-like sequences were
recovered from plants other than Arabidopsis, including from rice and poplar for which whole genomes have been se-
quenced. Phylogenetic analysis of the full-length sequences showed a high degree of amino acid sequence conservation in
EMF2 homologs of closely related taxa. VRN2 homologs are recovered as a clade nested within the larger EMF2 clade. FIS2
and VEF-L36 are recovered in the VRN2 clade. VRN2 clade may have evolved from an EMF2 duplication event that occurred
in the rosids prior to the divergence of the eurosid I and eurosid II lineages. We propose that dynamic changes in genome
evolution contribute to the generation of the family of VEF-domain-containing genes. Phylogenetic analysis of the VEF
domain alone showed that VEF sequences continue to evolve following EMF2/VRN2 divergence in accordancewith species
relationship. Existence of EMF2-like sequences in animals and across land plants suggests that a prototype form of EMF2
was present prior to the divergence of the plant and animal lineages. A proposed sequence of events, based on domain
organization and occurrence of intermediate sequences throughout angiosperms, could explain VRN2 evolution from an
EMF2-like ancestral sequence, possibly following duplication of the ancestral EMF2. Available data further suggest that
VEF-L36 and FIS2 were derived from a VRN2-like ancestral sequence. Thus, the presence of VEF-L36 and FIS2 in a genome
may ultimately be dependent upon the presence of a VRN2-like sequence.
Key words: VEF; EMF2; FIS2; VRN2; VEF-L36; Arabidopsis; PcG; phylogeny; evolution.
INTRODUCTION
Identifying genes that act in developmental pathways and de-
termining how they or their interactions are modified
throughout organismal evolution is a major focus of the field
of evolutionary developmental biology. Understanding how
genes and gene networks function during the development
of the model plant Arabidopsis thaliana provides a starting
point for investigating how characterized developmental
pathways may have played a role in the evolution of diverse
plant body plans (Irish and Benfey, 2004).
The Polycomb Group protein (PcG) genes play a major role
in epigenetic regulation of gene expression. Originally charac-
terized in Drosophila, they encode a conserved group of chro-
matin proteins found in animals and plants. Structurally
different Drosophila PcG proteins form complexes that main-
tain the repression of target genes. A PcG protein complex,
composed of four core proteins (Suppressor of Zeste 12
(Su(z)12), Extra sex combs (Esc), P55, and Enhancer of zeste
(E(z)) (Kuzmichev et al., 2002; Muller et al., 2002)), can meth-
ylate histone H3 at lysine 27 through the E(z) SET domain, pro-
viding a methyl mark for subsequent transcriptional repression
and gene silencing (Cao et al., 2002; Czermin et al., 2002;
1 To whom correspondence should be addressed. E-mail zrsung@nature.
berkeley.edu, fax (510) 642-4995, tel. (510) 642-6966.
ª The Author 2009. Published by the Molecular Plant Shanghai Editorial
Office in association with Oxford University Press on behalf of CSPP and
IPPE, SIBS, CAS.
doi: 10.1093/mp/ssp032, Advance Access publication 19 June 2009
Received 10 March 2009; accepted 25 April 2009
Muller et al., 2002). Arabidopsis genes structurally similar to
Drosophila PcG genes have been reported and their mu-
tants characterized: CURLY LEAF (CLF) (Goodrich et al.,
1997), FERTILIZATION-INDEPENDENT SEED DEVELOPMENT1
(FIS1)/MEDEA (MEA) (Grossniklaus et al., 1998; Luo et al.,
1999), SWINGER (SWN) (Chanvivattana et al., 2004), FIS3/
FERTILIZATION-INDEPENDENT ENDOSPERM (FIE) (Ohad et al.,
1999), FERTILIZATION-INDEPENDENT SEED2 (FIS2) (Luo et al.,
1999), EMBRYONIC FLOWER2 (Yoshida et al., 2001), and VER-
NALIZATION2 (Gendall et al., 2001), and MULTICOPY SUPPRES-
SOR OF IRA1 (MSI1) (Hennig et al., 2003). Evidence indicates
that these genes encode proteins that form putative PcG com-
plexes involved in maintaining the silencing of Arabidopsis
MADS-box genes (Chanvivattana et al., 2004; Sung and
Amasino, 2004; Wood et al., 2006). Some PcG genes can be
grouped into families based on sequence homology, such
as CLF, MEDEA (MEA), and SWN (Chanvivattana et al., 2004)
and EMF2,VRN2, and FIS2 (Yoshida et al., 2001). It is possible
that these gene families are the result of gene duplication
and subsequent diversification from ancestral sequences that
were present prior to the divergence of the lineages, ultimately
leading to plants and animals.
Duplicationanddiversificationofnucleotidesequenceshave
been shown to lead to functional innovation across the tree of
life (Kim et al., 2004). EMF2 is a core component of the putative
PcG complex that represses flowering (Chanvivattana et al.,
2004). Loss-of-function mutation in the EMF2 gene leads to
elimination of vegetative growth in Arabidopsis (Yang et al.,
1995), resulting in early flowering. EMF2 thus may have played
a major role in plant survival and the evolution of phenological
variability. Protein interactions between EMF2 and three other
proteins, CLF (Goodrich et al., 1997), FIE (Kinoshita et al., 2001),
and MSI1 (Hennig et al., 2003), suggest that they function as
a protein complex in mediating floral repression. The putative
EMF2/CLF or SWN/FIE/MSI1 complex represses the flower
MADS-box genes AGAMOUS (AG), APETALLA 3 (AP3), and
PISTILLATA (PI) during vegetative development (Moon et al.,
2003; Calonje et al., 2008). CLF also represses flowering time
genes, such as FLOWERING LOCUS T (FT), AG-LIKE 19 (AGL19)
(Schonrock et al., 2006; Jiang et al., 2008). FIS2 is a core compo-
nent of the putative PcG complex FIS2/MEDEA (MEA)/FIE/MSI1
that regulates Arabidopsis seed development via repression of
PHERES1 during gametophyte and endosperm development
(Kohler et al., 2003). VRN2 is a core component of another pu-
tative PcG complex VRN2/CLF or SWN/FIE/MSI1 that induces
flowering in response to vernalization via the regulation of
theFLOWERINGLOCUSC (FLC) (Sung andAmasino, 2004; Wood
et al., 2006). It appears that the two groups of plant PcG genes,
CLF-MEA-SWN and EMF2-VRN2-FIS2, have co-evolved to form
multi-protein complexes that target different gene regulatory
networks (Calonje and Sung, 2006).
The molecular similarity of the VEF genes suggests that
they are related and may be the result of an historic gene du-
plication event followed by diversification. To understand
how the Arabidopsis VEF gene family evolved, we investi-
gated homologs of this gene family in Arabidopsis and other
land plants. In this paper, we identified 85 partial and full-
length sequences from land plants with a taxonomic focus
on flowering plants. Our results suggest that EMF2 is the most
plesiomorphic form of the gene and may have acted as a pro-
totype in the generation of the VEF gene family. Intragenic
sequence duplication, deletion/insertion, and intergenic
exon shuffling could account for the structural and functional
diversification of the VEF genes from an EMF2-like ancestor.
We propose that VRN2 evolved from an EMF2-like ancestor,
and that VEF-L36 and FIS2 were derived from a VRN2-like
ancestral sequence in Arabidopsis and possibly in other
angiosperms.
RESULTS
Domain Organization in Arabidopsis VEF Family Proteins
Using a deduced EMF2 amino acid sequence to BLAST against
GenBank, four full-length Arabidopsis proteins, EMF2
(At5g51230), FIS2 (At2g35670), VRN2 (At4g16845), and VEF-
L36 (At4g16810), were recovered with significant e-values
(,2e–12). In addition to the common VEF domain that defines
this gene family (Figure 1), EMF2, VRN2, and FIS2 share a C2H2
domain. EMF2 and VRN2 further share an N-terminal domain
(N-ter) that is present in the Drosophila homolog, Su(z)12, but
is absent in FIS2 and VEF-L36. However, VRN2 differs from
EMF2 in lacking sequence corresponding to EMF2 exon 5
Figure 1. Domain Organization of VEF-Domain-Containing Pro-teins of Arabidopsis.
Blue block: EMF2 N-terminal domain (N-ter), which is composed oftwo parts: an N-terminal cap (cap) and the remaining part (N-terDcap) as seen in VRN2. Orange block: EMF2-specific E5–10 domain.Green block: C2H2 zinc finger domain. Red block: VEF domain,which is uniquely located at the N-terminus of VEF-L36. Pink block:EMF2/VRN2-specific E15–17 domain. Light-blue block: VEF-L36-spe-cific repeat domain. Dark-green block: VEF-L36-specific L36 do-main. Yellow block: FIS2-specific S-rich domain. Purple block: FIS2C-terminal tail.
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 739
through exon 10 (E5–10), as well as a stretch of sequence at
the N-terminal called the N-terminal cap (N-ter cap). VRN2
also has a 52-aa repeat in the C-terminus that is absent in
EMF2. Despite these differences, globally, VRN2 and EMF2
share similar domain organization and 45% amino acid
sequence identity.
First reported as EMF2-like 1 by Yoshida et al. (2001), VEF-L36
is a hypothetical protein, based on its predicted gene structure
from TAIR (TAIR: www.Arabidopsis.org/servlets/TairObject?id=
128616&type=locus). It shares only the VEF domain with the
other VEF proteins (Figure 1). Unlike EMF2, VRN2, and FIS2,
its VEF domain is located at the N-terminus and its
C-terminus comprises a sequence with low similarity to ribo-
somal protein L36. There is also a stretch of repeat sequence
in the middle region that is not found in any of the other
VEF genes.
Widespread of EMF2/VRN2 Homologs among Land Plants
To investigate the distribution of homologs of VEF genes in
plants, we used VEF-containing proteins to perform BLAST
searches against the databases listed above (see Methods). Us-
ing the Arabidopsis EMF2 amino acid sequence to BLAST
against GenBank, 10 full-length homologs were returned,
eight from grasses (Poaceae), one from Carica (Caricaceae),
and one from Silene (Caryophyllaceae) (Table 1). The grass
homologs included one from wheat (Triticum aestivum), three
from barley (Hordeum vulgaris), two from maize (Zea mays),
and two from rice (Oryza sativa). The Silene homolog is from
Silene latifolia of Caryophyllaceae, a member of the core eudi-
cots. The Chromatin Database (www.chromdb.org/) identifies
three full-length sequences from poplar (Populus trichocarpa:
VEF901, 902, and 904) and one partial sequence (VEF903). The
full-length sequences are heretofore referred to as PtEMF2_1
for VEF901, PtEMF2_2 for VEF902, and PtEMF2_4 for VEF904
(see Table 1A).
We also sequenced six full-length cDNAs from species in five
different angiosperm families representing early-diverging
monocots (Acorus; Acorales), higher monocots (Asparagus,
Yucca; Asparagales), basal eudicots (Eschscholzia; Papavera-
ceae), and the asterids (Solanum; Solanaceae) (see Methods).
The Kazusa DNA Research Institute provided one full-length
sequence from Lotus japonicus (Fabaceae). Using deduced
amino acid sequences of these cDNAs to BLAST against Gen-
Bank, the same homologs were returned as when using the
Arabidopsis EMF2 sequence. Using full-length VRN2, VEF-
L36, and FIS2 to BLAST against GenBank, we found mostly
the same sequences as described above, likely due to sequence
homology in the VEF domain.
Pair-wise identity scores of these full-length sequences indi-
cate that non-Arabidopsis sequences display higher identity to
Arabidopsis EMF2 and VRN2 than to FIS2 and VEF-L36 (Table 2).
Among these homologs, VEF-L36 shows lowest pair-wise iden-
tity to other members (average score: 8), followed by FIS2 (av-
erage score: 17). Both show higher identity to VRN2 than to
other EMF2/VRN2 homologs.
Sequence alignment of the 24 full-length proteins was per-
formed using MUSCLE (www.ebi.ac.uk). All non-Arabidopsis
full-length sequences possess the N-terminal (N-ter), C2H2,
and VEF domains homologous to that of EMF2/VRN2 sequen-
ces (Figure 2), indicating a high conservation of domain orga-
nization. These sequences are not likely to be orthologs of FIS2
or VEF-L36 due to both the presence of the EMF2/VRN2-
characteristic N-ter domain and the absence of either the S-rich
domain found in FIS2 or the L36 domain characteristic of VEF-
L36 (Figure 1). Sixteen full-length, non-Arabidopsis sequences
contain the complete N-ter that included the N-ter cap:
ZmEMF2_1, ZmEMF2_2, HvEMF2_4, HvEMF2_5, LjEMF2,
OsEMF2_4, AaEMF2, YfEMF2, AoEMF2, LeEMF2_1, SIEMF2,
LjEMF2, PtEMF2_1, PtEMF2_2, TaEMF2_3, CpEMF2. Five
sequences, EcEMF2_2, OsEMF2_9, HvEMF2_1, ZmEMF2-2, and
PtEMF2_4, lack the N-ter cap. One sequence from barley,
HvEMF2_1, lacks both N-ter cap and the VEF domain. Together
with Arabidopsis EMF2 and VRN2, these full-length EMF2/
VRN2 sequences represent 14 species from 11 angiosperm fam-
ilies (Acoraceae, Asparagaceae, Agavaceae, Poaceae, Caryo-
phyllaceae, Fabaceae, Brassicaceae, Solanaceae, Salicaceae,
Caricaceae, and Papaveraceae). No discernable FIS2 or VEF-
L36 orthologs were recovered from rice or poplar, despite
the availability of full genomic sequences.
In addition to the full-length sequences, we found ;140 in-
complete sequences showing significant homology to three
EMF2 domains in various genomic databases (see Methods).
After the elimination of identical sequences, 54 new sequences
homologous to one or more EMF2 domains were identified
(Table 1B): (1) 9 ESTs possess N-terminal domain sequences,
(2) 16 possess C2H2 domain sequences, and (3) 36 possess
VEF domain sequences, from nine additional angiosperm fam-
ilies (Malvaceae, Vitaceae, Liliaceae, Vitaceae, Nymphaeaceae,
Ranuculaceae, Asteraceae, Bromeliaceae, and Euphorbiaceae)
(Table1andSupplementalFigure1).Altogether,78sequences—
24 full-length and 54 partial sequences—were identified from
20 angiosperm families.
Outside of the angiosperms, we identified two gymnosperm
ESTs sharing homology with the EMF2 C-terminal domain from
Pinaceae (Supplemental Figure 2B and 2C), one each in Pinus
taeda (pine) and Picea engelmanii (spruce), and two individual
ESTs from the lycophyte species Selaginella mollendorffii (Ta-
ble 1C). One Selaginella partial sequence (SdEMF2p_1) con-
tained both N-ter and C2H2 domains, showing a 44–39%
identity to the respective domains of EMF2. The other Selag-
inella sequence (SdEMF2p) contained only the VEF domain,
showing a 58% identity to EMF2’s VEF in a 145-aa region of
overlap (Table 1C and Supplemental Figure 2A).
The Chromatin Database yielded three full-length sequen-
ces homologous to EMF2 from Physcomitrella patens (Bryo-
phyta; moss), PpEMF2_1, _2, _3 (Table 1C). Despite low
sequence similarity to Arabidopsis EMF2 (;25%), the moss
sequences possess N-ter, C2H2, and VEF domains. These find-
ings that EMF2/VRN2 homologs exist in lycophytes and
mosses and have similar domain structure to modern
740 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
Table 1. Full-Length and Partial Sequences of VEF Gene Homologs.
(A) Full-length VEF gene homologs from Angiosperm.
Name Family Plant Accession #
AaEMF2 Acoraceae Acorus americanus, sweet flag GenBank: ABI99480
AoEMF2 Asparagaceae Asparagus officinalis, sparagus GenBank: ABD85301
AtEMF2 Brassicaceae Arabidopsis thaliana TAIR: AT5G51230
CpEMF2 Caricaceae Carica papaya CoGe: Chr Supercontig_1321118352–2159309
EcEMF2_1 Papaveraceae Eschscholzia californica, California poppy GenBank: ABD98790
EcEMF2_2 Papaveraceae Eschscholzia californica, California poppy GenBank: ABD98791
FIS2_692 Brassicaceae Arabidopsis thaliana TAIR: AT2G35670
HvEMF2_1 Poaceae Hordeum vulgare, barley GenBank: BAD99132
HvEMF2_4 Poaceae Hordeum vulgare, barley GenBank: BAD99131
HvEMF2_5 Poaceae Hordeum vulgare, barley GenBank: BAD99131
LeEMF2_1 Solanaceae Lycopersicon esculentum GenBank: ABI99480
LjEMF2 Fabaceae Lotus japonicus Legume database
OsEMF2_4 Poaceae Oryza sativa, rice TIGR: LOC_Os04g08034
OsEMF2_9 Poaceae Oryza sativa, rice TIGR: LOC_Os09g13630
PtEMF2_1 Salicaceae Populus trichocarpa, cottonwood ChromDB: VEF901
PtEMF2_2 Salicaceae Populus trichocarpa, cottonwood ChromDB: VEF902
PtEMF2_4 Salicaceae Populus trichocarpa, cottonwood ChromDB: VEF904
SlEMF2 Caryophyllaceae Silene latifolia, white campion GenBank: BAD93353
TaEMF2_3 Poaceae Triticum aestivum, wheat GenBank: AAX78232
VRN2_445 Brassicaceae Arabidopsis thaliana TAIR: AT4G16845
VEF_L36 Brassicaceae Arabidopsis thaliana TAIR: AT4G16810
YfEMF2 Yuccaceae Yucca filamentosa, Yucca GenBank: ABD85300
ZmEMF2_1 Poaceae Zea mays, maize ChromDB: VEF101
ZmEMF2_2 Poaceae Zea mays, maize ChromDB: VEF102
(B) EMF2/VRN2-related ESTs from Angiosperm.N-terminal (nine ESTs)Name Family Plant Accession #
CaEMF2p Solanaceae Capsicum annuum, pepper GenBank:CA847455
GmEMF2p_3 Fabaceae Glycine max, soybean TIGR: TC221104
GmEMF2p_4 Fabaceae Glycine max, soybean TIGR:TC211671
GrEMF2p Malvaceae Gossypium barbadense, cotton TIGR:TC40052
LsEMF2p_1 Asteraceae Lactuca saligna, lettuce TIGR:TA10917_4236
MtEMF2p Fabaceae Medicago truncatula TIGR: TC108897
SbEMF2p_2 Poaceae Sorghum bicolor, sorghum TIGR: TA29013_4558
VvEMF2p_3 Vitaceae Vitis vinifera, grape GenBank: CF609577
ZmEMF2p_3 Poaceae Zea mays, maize TIGR:CD436196
C2H2 zinc finger (16 ESTs)Name Family Plant Accession #
CcEMF2p_1 Rubiaceae Coffea canephora TIGR: TA7702_49390
CsEMF2p_1 Asteraceae Centaurea solstitialis TIGR: TA4722_347529
CtEMF2p Asteraceae Carthamus tinctorius TIGR: TA2823_4222
EeEMF2p Euphorbiaceae Euphorbia esula TIGR: TA17942_3993
GmEMF2p_3 Fabaceae Glycine max, soybean TIGR: TC221104
GtrEMF2p Asteraceae Gerbera hybrid cv. Terra Regina GenBank: AJ759904
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 741
Table 1. Continued
C2H2 zinc finger (16 ESTs)Name Family Plant Accession #
LeEMF2p_2 Solanaceae Lycopersicon esculentum, tomato GenBank: AW038171
SbEMF2p_2 Poaceae Sorghum bicolor, sorghum TIGR: TA29013_4558
ScEMF2p Poaceae Secale cereale, cereal rye GenBank: BE587348
SoEMF2p_2 Poaceae Saccharum officinarum, sugarcane TIGR: TA38345_4547
SoEMF2p_3 Poaceae Saccharum officinarum, sugarcane TIGR: TC71329
SoEMF2p_1 Poaceae Saccharum officinarum, sugarcane GenBank: CA098901
TaEMF2p_2 Poaceae Triticum aestivum, wheat GenBank: BJ211655
ToEMF2p_1 Asteraceae Taraxacum officinale TIGR: TA5836_50225
VvEMF2p_1 Vitaceae Vitis vinifera, grape GenBank: CN006883
ZmEMF2p_4 Poaceae Zea mays, maize TIGR: TA193846_4577
VEF domain (36 ESTs)Name Family Plant Accession #
AcEMF2p Liliaceae Allium cepa GenBank: CF443745
AfEMF2p Ranunculaceae Aquilegia formosa TIGR: TA14166_338618
AnanasEMF2p Bromeliaceae Ananas comosus GenBank: DT339533
BnEMF2p_1 Brassicaceae Brassica napus GenBank: CX194398
BnEMF2p_2 Brassicaceae Brassica napus GenBank: CX188412
CcEMF2p Rubiaceae Coffea canephora TIGR: TA7701_49390
CiEMF2p_1 Asteraceae Cichorium intybus GenBank: EH708467
CiEMF2p_2 Asteraceae Cichorium intybus TIGR: TA5136_13427
CsEMF2p Asteraceae Centaurea solstitialis GenBank: EH782846
EeEMF2p Euphorbiaceae Euphorbia esula TIGR: TA17942_3993
GhEMF2p_1 Malvaceae Gossypium hirsutum, cotton GenBank: DW229901
GhEMF2p_2 Malvaceae Gossypium hirsutum, cotton TIGR: TA37052_3635
GhEMF2p_3 Malvaceae Gossypium hirsutum, cotton TIGR: TA35411_3635
GmEMF2p_1 Fabaceae Glycine max, soybean TIGR: TA61896_3847
HaEMF2p_1 Asteraceae Helianthus annuus, sunflower GenBank: CD848472
HpEMF2p_1 Asteraceae Helianthus paradoxus, sunflower GenBank: EL487885
LeEMF2p_3 Solanaceae Lycopersicon esculentum, tomato GenBank: BI932726
LsEMF2p Asteraceae Lactuca saligna, lettuce TIGR: TA3490_75948
LvEMF2p Asteraceae Lactuca virosa, wild lettuce GenBank: DW160707
MeEMF2p Euphorbiaceae Manihot esculenta, cassava GenBank: DV449784
MtEMF2p Fabaceae Medicago truncatula TIGR: TC108897
NaEMF2p Nymphaeaceae Nuphar advenar, yellow pondlily FGP: nad03-13ms2-e08
NtEMF2p Solanaceae Nicotiana tabacum, tobacco GenBank: EB678277
PsEMF2p Fabaceae Pisum sativum, pea GenBank: AAX47184
SbEMF2p_1 Poaceae Sorghum bicolor, sorghum TIGR: TA34517_4558
SbEMF2p_3 Poaceae Sorghum bicolor, sorghum TIGR: TA35158_4558
LeEMF2p Solanaceae Solanum lycopersicum, tomato GenBank: AW038171
SoEMF2p_1 Poaceae Saccharum officinarum, sugarcane GenBank: CA098901
SoEMF2p_2 Poaceae Saccharum officinarum, sugarcane TIGR: TA38345_4547
SoEMF2p_4 Poaceae Saccharum officinarum, sugarcane GenBank: CA098901
StEMF2p_2 Solanaceae Solanum tuberosum TIGR: TA35890_4113
StEMF2p_3 Solanaceae Solanum tuberosum GenBank: BQ505017
TaEMF2p_1 Poaceae Triticum aestivum, wheat TIGR: TA70383_4565
VvEMF2p_2 Vitaceae Vitis vinifera, grape TIGR: TA47215_29760
742 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
angiosperm EMF2 indicate that EMF2 was likely to have been
present in the genomes of early land plants (Supplemental
Figure 2D).
Sequence Comparison of EMF2/VRN2 Class Homologs
Predicted full-length and partial EMF2/VRN2 protein homo-
logs were aligned using MUSCLE (see Methods).
N-terminal (N-ter) Domain
An N-terminal domain for Arabidopsis EMF2 was defined by
Yoshida et al. (2001) as a fragment starting from amino acid
47 to 81 (Figure 2A). The domain is also found in VRN2 and in
the Drosophila Su(z)12 protein. Our alignment of the full-
length sequences from all identified EMF2/VRN2 class homo-
logs shows that a larger area is conserved across land plants,
starting from the first amino acid of EMF2 to the end of exon
4 (aa 81), and is heretofore referred to as the N-ter domain
(Figure 2A). Relative to EMF2, VRN2 has an abbreviated N-
ter domain, starting translation from a methionine (M) cor-
responding to the second M of EMF2. The sequence between
the two methionines of EMF2 is referred to as the N-ter cap.
EMF2/VRN2 homologs of monocots Acorus, Yucca, Aspara-
gus, and the basal eudicot Eschscholzia all contain the N-
ter cap (Figure 2A), suggesting that the angiosperm ancestral
sequence may have had both M sites, similar to Arabidopsis
EMF2. Indeed, Selaginella SdEMF2p_1 and the Physcomitrella
sequences, PpEMF2_3 (VEF1503), have an N-ter cap (Supple-
mental Figure 2D), although the sequences and lengths are
divergent from that found in the identified angiosperm
sequences. Some N-ter cap’s second M is replaced with a dif-
ferent aa; for example, S1EMF2’s second M is replaced by an S
(Figure 2A).
In species with two or more EMF2 class homologous
sequences found so far, at least one sequence has such
a cap, such as rice (OsEMF2_4 vs. OsEMF2_9), maize (ZmEMF2_1
vs. ZmEMF2_2), poplar (PtEMF2_1 and PtEMF2_2 vs.
PtEMF2_4), and California poppy (EcEMF2_1 vs. EcEMF2_2)
(Figure 2A and Supplemental Figure 1A). The early land plants
also possess at least one sequence with the N-ter cap (Supple-
mental Figure 2A and 2D).
E5–10 Domain
VRN2 is missing most of the amino acid sequence correspond-
ing to EMF2 exon 5 through exon 10 (E5–10). Comparison of
Table 1. Continued
VEF domain (36 ESTs)Name Family Plant Accession #
VvEMF2p_4 Vitaceae Vitis vinifera, grape GenBank: AM447481
ZmEMF2p_4 Poaceae Zea mays, maize TIGR: TA193846_4577
(C) EMF2/VRN2 homologs from Gymnosperm, Spikemoss, and moss.GymnospermName Family Plant Accession #
PeEMF2p Pinaceae Picea engelmannii, spruce TIGR: TA1969_373101
PlEMF2p Pinaceae Pinus taeda, pine GenBank: CO368996
LycophyteName Family Plant Accession #
SdEMF2p Selaginellaceae Selaginella moellendorffii,Spikemoss
gnl|050718cr339|1588846_1
SdEMF2p_1 Selaginellaceae Selaginella moellendorffii,Spikemoss
gnl|050718cr339|1588846_2
MossName Family Plant Accession #
PpEMF2_1 Funariaceae Physcomitrella patens, moss ChromDB: VEF1501
PpEMF2_2 Funariaceae Physcomitrella patens, moss ChromDB: VEF1502
PpEMF2_3 Funariaceae Physcomitrella patens, moss ChromDB: VEF1503
Note: ‘p’ in the sequence name stands for partial sequence. The letters in the sequence name stand for the following plants: Aa: Acorus americanus,Ac: Allium cepa, Af: Aquilegia formosa, Ao: Asparagus officinalis, At: Arabidopsis thaliana, Bn: Brassica napus, Ca: Capsicum annuum, Cc: Coffeacanephora, Ci: Cichorium intybus, Cp: Carica papaya, Cs: Centaurea solstitialis, Ct: Carthamus tinctorius, Ec: Eschscholzia californica, Ee: Euphorbiaesula, Gh: Gossypium hirsutum, Gm: Glycine max, Gr: Gossypium barbadense, Gtr: Gerbera, Ha: Helianthus annuus, Hp: Helianthus paradoxus, Hv:Hordeum vulgare, Le: Lycopersicon esculentum, Lj: Lotus japonicus, Ls: Lactuca saligna, Lv: Lactuca virosa, Me: Manihot esculenta, Mt: Medicagotruncatula, Na: Nuphar, Nt: Nicotiana tabacum, Os: Oryza sativa, Pe: Picea engelmannii, Pl: Pinus taeda, Pp: Physcomitrella patens, Ps: Pisum sativum,Pt: Populus trichocarpa, Sb: Sorghumbicolor, Sc: Secale cereale, Sd: Spikemoss, Sl: Silene latifolia, So: Saccharumofficinarum, St: Solanum tuberosum,Ta: Triticum aestivum, To: Taraxacum officinale, Vv: Vitis vinifera, Yf: Yucca filamentosa, Zm: Zea mays.
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 743
EMF2 and VRN2 genomic DNA revealed no conserved corre-
sponding sequence in VRN2 in this region, excluding the pos-
sibility of alternative mRNA splicing as the cause of the
difference. One full-length sequence, PtEMF2_4 from Populus
trichocarpa (poplar) (Figure 2 and Supplemental Figure 1B),
and three partial sequences, MtEMF2p from Medicago trunca-
tula, GmEMF2p_3 from Glycine max, and CaEMF2p from Cap-
sicum annuum, lack both the E5–10 domain and the N-ter cap,
like VRN2. Within the E5–10 domain, amino acids encoded by
EMF2 exon 5 (Figure 2A), 6, and 8 were highly conserved
among the non-Arabidopsis EMF2 homologs, suggesting po-
tential conserved function of the E5–10 region. Alignment
analysis suggested the presence of E5–10 in all three Physcomi-
trella sequences, though with divergent sequences (Supple-
mental Figure 2D).
C2H2 Zinc Finger Domain
Unlike most Arabidopsis C2H2-domain-containing genes that
have multiple C2H2 motifs in tandem (Englbrecht et al., 2004),
the three VEF proteins, VRN2, FIS2, and EMF2, contain a single
C2H2 motif that is encoded by exon12 and 13 in EMF2. Previous
studies found a conserved 37-aa C2H2 domain sequence in
EMF2 homologs from Drosophila, human, and Arabidopsis
(Yoshida et al., 2001). Alignment of the full-length sequences
shows a conserved region extending from EMF2 exon 11
through 14 in the EMF2 homologs, covering a range of
;77 aa. In the EMF2/VRN2 class, VRN2’s C2H2 is most divergent
from that of other members (Figure 2B). Selaginella’s
SdEMF2p_1 has the C2H2 region with 39% identity to that of
EMF2 (Supplemental Figure 2A). Physcomitrella’s PpEMF2_3
has a region corresponding to that of EMF2’s C2H2; its two
Table 2. Pair-Wise Alignment Scores of Full-Length VEF Protein Homologs.
Sequences name1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
1. AaEMF2 -
2. AoEMF2 61 -
3. AtEMF2 54 55 -
4. CpEMF2 57 58 68 -
5. EcEMF2_1 53 55 53 54 -
6. EcEMF2_2 52 52 51 51 57 -
7. FIS2_692 18 19 16 18 19 18 -
8. HvEMF2_1 42 49 40 43 40 43 12 -
9. HvEMF2_4 52 60 49 52 50 49 17 78 -
10. HvEMF2_5 42 46 41 40 37 39 16 54 58 -
11. LeEMF2_1 58 59 58 62 53 52 18 41 50 42 -
12. LjEMF2 40 40 42 46 40 38 17 32 37 34 43 -
13. OsEMF2_4 45 50 42 45 45 43 17 53 61 54 43 32 -
14. OsEMF2_9 53 61 50 52 52 52 18 71 82 61 51 37 62 -
15. PpEMF2_1 25 25 27 26 25 25 13 17 27 23 27 21 22 24 -
16. PpEMF2_2 24 23 26 23 23 20 10 15 21 20 24 15 22 23 69 -
17. PpEMF2_3 26 28 29 28 29 26 15 20 29 22 28 22 28 29 55 53 -
18. PtEMF2_1 57 56 63 71 53 51 17 42 49 39 59 41 44 51 23 25 28 -
19. PtEMF2_2 56 58 63 70 53 50 19 40 50 42 60 42 42 51 27 21 32 84 -
20. PtEMF2_4 61 60 53 54 52 58 30 25 49 43 57 44 43 50 27 31 30 56 55 -
21. SlEMF2 53 53 57 62 50 47 18 35 45 39 58 44 39 47 23 19 27 57 58 49 -
22. TaEMF2_3 51 58 49 51 51 48 18 80 93 57 49 35 60 81 27 23 29 48 50 48 45 -
23. VEF_L36 8 8 8 8 8 7 7 5 8 7 8 6 8 8 8 9 7 8 8 12 8 8 -
24. VRN2_445 46 47 45 45 43 46 31 20 44 34 48 36 37 44 29 29 31 45 44 51 42 44 14 -
25. YfEMF2 61 82 56 58 57 52 18 50 58 47 59 40 49 60 25 24 28 57 59 60 53 57 9 47 -
26. ZmEMF2_1 51 57 48 51 48 47 16 68 75 58 50 36 60 77 23 23 27 47 48 48 46 78 8 43 55 -
27. ZmEMF2_2 55 59 51 51 51 52 17 71 80 60 53 39 62 81 24 22 29 51 51 48 49 80 8 44 59 80 -
Average score 46 49 46 48 44 43 17 42 51 41 47 35 43 51 26 25 28 47 47 46 43 51 8 40 49 49 51
Note: 1. The number listed in the top line represents sequence with same number that is listed in the first column.Calculation of pair-wise alignment scores was described in Methods. Average scores were calculated as the sum of the individual score in onecategory divided by 26. Among these homologs, VEF-L36 showed lowest identity to other members (average score: 8), followed by FIS2 (averagescore: 17). On the other hand, both showed higher identity to VRN2 than to other EMF2/VRN2 homologs (pair-wise alignment score between VEF-L36 and VRN2: 14, pair-wise alignment score between FIS2 and VRN2: 31). The average pair-wise alignment score of other EMF2/VRN2 members was;44, calculated as the sum of the average scores (excluding 8 and 17) divided by 25.
744 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
Figure 2. Alignment of Three Domains of Predicted Full-Length Plant VEF Proteins.
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 745
Cs line up with those in EMF2, but the two Hs are absent
(Supplemental Figure 2D).
E15–17 Domain
E15–17 is a region encoded by EMF2 exon 15 to 17, connecting
the C2H2 and VEF domains of EMF2, VRN2, and FIS2. Align-
ment of the EMF2 homologs shows that this region has the
highest variability of all EMF2 domains in both amino acid se-
quence composition and in total length, suggesting intensive
diversification including multiple insertion and/or deletion
events during the evolution of this region (Supplemental Fig-
ure 1D). All three Physcomitrella sequences, including
PpEMF2_3, appear to possess this region.
VEF (C-terminal) Domain
Alignment of C-terminal sequences of EMF2, VRN2, FIS2,
Su(z)12, and the human KIAA0160 led Yoshida et al. (2001)
to define an acidic-W/M domain, ;130 aa from exons 18–22
in Arabidopsis EMF2, which is characterized by an acidic cluster
and a sequence rich in tryptophan and methionine. A smaller
region was later called the VEF domain derived from the ini-
tials of VRN2, EMF2, and FIS2 (Birve et al., 2001), which did not
include sequences in exon 18, but extended beyond that of the
acidic-W/M domain (Figure 2). In this paper, we adopt
a broader sense of the VEF domain, encompassing both the
acidic-W/M, defined by Yoshida et al. (2001), and the VEF, by
Birve et al. (2001), domains (Supplemental Figure 1E–1G).
VRN2hasanadditional52-aaC’oftheVEFdomain(Supplemental
Figure 1G and Figures 1 and 2C) that is not shared with other
EMF2-class proteins, including VRN2-like sequences, full-length
or partial from plants other than Arabidopsis. Analysis using
RADAR (www.ebi.ac.uk/Radar/) suggests that this 52-aa region
is a duplication of a stretch of amino acids found within the
VEF domain (Supplemental Figure 1G).
Selaginella SdEMF2p corresponds to the VEF domain
(Supplemental Figure 2A). All three Physcomitrella sequences
and the two partial gymnosperm sequences possess the VEF
domain (Supplemental Figure 2B–2D). None of the VEF
domains found in Physcomitrella, Selaginella, pine, or spruce
possesses the VRN2-characteristic repeat sequence in their C-
termini, indicating that this repeat likely evolved in angio-
sperms after the divergence of the gymnosperm lineage.
Among the three moss sequences, PpEMF2_3 is the most sim-
ilar to EMF2 in that it possesses the N-ter cap, E5–10, C2H2-like,
and VEF regions.
Phylogenetic Analysis of Full-Length and VEF Sequences
Phylogenetic analysis of the full-length sequences using max-
imum likelihood and Bayesian methods recovered various lin-
eages reflecting organismal evolution (Figures 3 and 4). Using
human and Drosophila sequences as outgroups, phylogenetic
analyses of full-length sequences (Figure 3) and VEF domain
alone (Figure 4) both recovered a monophyletic angiosperm
lineage with monophyletic monocot and eudicot clades.
Within the monocots, the grasses (Poales) were also recov-
ered as monophyletic in both full-length and VEF-based gene
trees. For VEF domain analyses containing greater sampling
of land plant diversity, gymnosperms were found to be mono-
phyletic and sister to angiosperms, Selaginella sister to an an-
giosperm plus gymnosperm clade, and Physcomitrella
sequences sister to remaining land plants. As with full-length
sequences, monocots are recovered as monophyletic; how-
ever, Eschscholzia, unresolved in the full-length analysis,
groups with Aquilegia VEF domain (Figure 4), forming a basal
eudicot clade sister to monocots. This clade is unresolved with
respect to monocots and core eudicots. Within monophyletic
core eudicots, the asterids and rosids are roughly falling out
as separate clades, with a few exceptions (e.g. Silene within
rosid clade, two sequences of Gossypium recovered as sister to
the rosid plus asterid sister group, Lotus japonicus within an
otherwise monophyletic asterid clade, and one Helianthus se-
quence falling within the rosids rather than the asterids).
In addition, several sequences from core eudicot species are
resolved in a clade containing VRN2, FIS2, and VEF-L36 (Figure
4). This clade is distant from AtEMF2, indicating a different
evolutionary history for the VEF domain of VRN2, FIS2, and
VEF-L36. In the full-length analyses, PtEMF2_4 or VEF904, a
proposed VRN2 ortholog from Populus, is strongly supported
within a VRN2 clade reflecting potential homology (or full--
length sequence conversion) of the Populus sequence with
VRN2. In the VEF domain analyses, this Populus sequence
groups with other Populus sequences rather than with the
VRN2 clade, indicating that the VEF domain itself is not con-
verging on a VRN2-like VEF domain, despite full sequence
and domain-level similarity. Another potential VRN2 ortholog,
Medicago truncatula’s MtEMF2p, lacking the E5–10 domain
and the N-ter cap, is grouped in the VRN2 clade. It remains
The T-COFFEE (Version 4.85) program was used for the sequence alignment. Vertical lines on top of the sequence mark the boundaries ofEMF2 exons, and the arrows and numbers prefixed with an E on top of the sequence indicate EMF2 exons.(A)N-ter domain. Light-blue bar on top of the sequence marks the N-ter domain. Colorless horizontal bar marks the N-ter cap. Dark-blue barmarks the N-terminal domain defined by Yoshida (2001).(B) C2H2 domain. Green bar on top of the sequence marks C2H2 domain defined by Yoshida (2001). Numbers –1, +3, and +6 denote theposition relative to the start site of the a-helix of the C2H2 domain.(C)VEF domain. Red and yellow horizontal bars on top of the sequence mark the C-terminal domain defined by Yoshida et al. (2001) and theVEF domain defined by Birve et al. (2001), respectively. Because VEF-L36 only shares VEF with other homologs, its middle and C-terminalsequences were cut off.
746 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
to be tested whether other homologs with VRN2-like domain
structure will have their VEF sequence converge with AtVRN2
VEF amino acid sequence.
Sequence Relationship between VEF-L36 and EMF2/VRN2
VEF-L36 cDNA was deduced from Arabidopsis genomic se-
quence (TAIR: www.Arabidopsis.org/servlets/TairObject?id=
128616&type=locus) but has not been assayed for function.
The 1872-bp open reading frame encodes a predicted 623-aa
protein, with the 125-aa VEF located at the N-terminus and
a 113-aa C-terminus with only low sequence similarity L36.
The RADAR program detected three types of repeat sequence
in the middle region of VEF-L36 (Figure 1 and Supplemental
Figure 3A). Except for the VEF domain, VEF-L36 shares no other
domains with the other three Arabidopsis VEF proteins. Using
its 495-aa sequence without the VEF domain to BLAST search
against GenBank, we found three Arabidopsis fragments
and one rice homolog, as well as few sequences in other
non-plant organisms, such asDrosophila,Dictyostelium,Danio,
and Trypanosoma, all lacking the VEF domain (Supplemental
Figure 3B). The rice homolog encodes a 410-aa protein with
low global homology to the non-VEF part of VEF-L36 (22%
identity and 37% similarity, Supplemental Figure 3C). To date,
VEF-L36 is the only gene found with both VEF and L36 domains.
The VEF domain of VEF-L36 is more closely related to that
of VRN2 than to EMF2, as indicated by phylogenetic analyses
of both the VEF domain alone and of full-length sequences
(Figures 3 and 4). Among the divergent amino acids between
EMF2 and VRN2, VEF-L36 shares nine with VRN2 and only
three with EMF2 (Table 3). Moreover, VRN2 (AT4G16845)
and VEF-L36 (AT4G16810) are closely linked on Arabidopsis
chromosome 4. Among the VEF-domain-containing proteins,
the VEF domain in VEF-L36 is the only one located at the
N-terminus of a protein. Together, these phenomena suggest
that the VEF domain of the VEF-L36 may be transferred from
VRN2 on a sister chromatin, through an accidental intronic
recombination event during meiosis (Figure 5C). This would
imply that only plants with VRN2 may generate L36-VEF. So
far, VEF-L36 has only been identified from Arabidopsis.
Sequence Relationship between FIS2 and EMF2/VRN2
FIS2 is similar to EMF2/VRN2 in possessing a single C2H2 and
the VEF domain, which is connected by a 459-aa region with
70 serines, called the S-rich domain. In addition to the two
types of repeats identified (Luo et al., 1999), RADAR identified
a third type of repeat in the S-rich domain (Supplemental Fig-
ure 4A). Sequences homologous to the S-rich domain have
been found in plants, fungi, bacteria, and animals, but none
share the C2H2 or VEF domains with FIS2. Despite the abun-
dance of the S-rich homologous domain in nature, the unique-
ness/rareness of the S-rich domain in VEF-domain-containing
protein family suggests that FIS2 may represent a unique evo-
lutionary event within the Arabidopsis lineage.
The C2H2 domain of FIS2 has greater sequence similarity to
VRN2 than EMF2 (Table 3). The VEF domain of FIS2 shows
Figure 3. Phylogenetic Analysis of Full-Length VEF Protein Homologs.
Phylogeny of EMF2/VRN2 using Bayesian inference; average branch lengths are shown. Measures of support are given at the nodes; Bayes-ian posterior probability (PP)/maximum likelihood bootstrap support (BS). Support values less than 50 are shown as hyphen "-" and supportvalues of 100 are shown as "+".
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 747
Figure 4. Phylogenetic Analysis of VEF Domain Sequences.
Phylogeny of VEF domain using maximum likelihood as implemented in RAxML. Measures of support are given at the nodes; Bayesianposterior probability (PP)/maximum likelihood bootstrap support (BS). Support values less than 50 are shown as hyphens (-). Taxonomicgroups indicated at right, with exceptions described in text.
748 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
a closer phylogenetic relationship to the VEF domain of VRN2
than to EMF2 (Figure 4), forming a clade with the VRN2 se-
quence indicating common ancestry to the exclusion of
EMF2. Among the amino acids diverged between EMF2 and
VRN2, FIS2 shares 20 identical amino acid residues with
VRN2 and only eight with EMF2 in the VEF domain (Table
3). Globally, FIS2 shared a higher pair-wise alignment score
with VRN2 than EMF2 (29 vs. 18%; Table 2).
DISCUSSION
The VEF domain is found in chromatin proteins required for
gene silencing throughout eukaryotic organisms. In addition
to the universal VEF domain, the VEF proteins possess other
characteristic domains that distinguish them from one an-
other. Based on domain organization, four Arabidopsis VEF
proteins were grouped into three classes: EMF2/VRN2, FIS2,
and VEF-L36 (Figure 1). Our analysis of homologous sequences
throughout land plants indicates the existence of EMF2 in
early diverging lineages of land plants (bryophytes and lyco-
phytes) and suggests the presence of an ancestral EMF2-like
gene in early land plants. Phylogenetic results (Figures 3
and 4) are consistent with the hypothesis that VRN2 was likely
derived from an EMF2-like ancestor within the angiosperms,
and that FIS2 and VEF-L36 were secondarily derived from
a VRN2-like ancestral sequence in Arabidopsis. Current phylo-
genetic hypotheses are limited in taxon sampling and in char-
acter sampling, constrained by currently available sequences
that are not equally distributed across angiosperm evolution
and may not represent complete genomic data for all species
sampled. Such limitations reduce overall phylogenetic resolu-
tion and make it difficult to assign orthology and paralogy to
the available sequences in the face of multiple gene and ge-
nome duplication events spanning angiosperm evolution.
However, given current sampling, our phylogenetic results in-
dicate that EMF2-like genes in angiosperms demonstrate an
evolutionary history largely consistent with the taxonomic his-
tory of the plants in which they are found.
Proposed Evolution of VEF Genes
The EMF2/VRN2 class proteins show strong sequence similarity
despite modified domain structure. Sequences with the EMF2-
like domain structure are widespread, found in animals and
most vascular plants. Sequences with the VRN2-like domain
structure have only been identified in poplar (PtEMF2_4), pep-
per (CaEMF2p), alfalfa (MtEMF2p), and soybean (GmEMF2_3)
(Table 1) as sequences that lack the N-ter cap and E5–10-like
VRN2. In Arabidopsis, EMF2 is an essential gene as evidenced
by the short-lived and sterile nature of the emf2 mutants.
VRN2 promotes vernalization-mediated flowering and vrn2
mutants flower late, but the loss of VRN2 is not lethal (Gendall
et al., 2001). Alternative vernalization mechanisms that do not
utilize a putative Arabidopsis VRN2 ortholog have evolved in
other species (Yan et al., 2004) and may be present in
Table 3. Number of Amino Acids Shared between FIS2/VEF-L36 and VRN2 or EMF2*.
Identical aa betweenFIS2 and VRN2
Identical aa betweenFIS2 and EMF2
Identical aa betweenVEF-L36 and VRN2
Identical aa betweenVEF-L36 and EMF2
C2H2 domain 20/131 8/131 na na
VEF domain 20/116 5/116 9/98 3/98
* Among the divergent amino acids between EMF2 and VRN2, the number of aa shared with EMF2 or VRN2 out of total number of aa inthe domain. na, not applicable.
Figure 5. Model on VRN2, FIS2, and VEF-L36 Evolution.
(A) Proposed VRN2 evolution from EMF2.(B) FIS2 evolution from VRN2.(C) VEF-L36 evolution from VRN2.
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 749
Arabidopsisas well. While every plant sequenced thus far has at
least one copy of EMF2, VRN2 is found only infrequently. The
dispensable nature of VRN2 may result in its lower frequency
of occurrence throughout land plants. Based on our data, it
is likely that VRN2 can arise from a duplication of an EMF2-like
ancestor. Once an additional EMF2 copy is present, one of the
copies is no longer under strong selection and is able to diverge,
potentially resulting in a VRN2-like sequence. Under this sce-
nario, VRN2-like sequences could arise multiple times and inde-
pendently following any duplication event that included the
EMF2gene. Similarity in domain structure and amino acid com-
position could then be the result of convergent evolution.
Genes possessing all domains found in EMF2 exist in insects
and mammals (Yoshida et al., 2001; Schuettengruber et al.,
2007). It can be argued, based on the presence of EMF2-like
genes in animals, lycophytes, bryophytes, gymnosperms, and
angiosperms, that early land plants shared an ancestral se-
quence having the domain structure found in modern copies
of EMF2. As the gene or genome duplicated, VRN2 may have
arisen from a duplication of the ancestral EMF2 (Figure 5A), fol-
lowed by subsequent loss of the N-ter cap and the E5–10 do-
main, and the acquisition of the 52-aa C-terminal repeat. The
presence of intermediary forms with partial domain structure
suggests a potential step-wise evolution of VRN2 from
an EMF2-like sequence. Among the full-length and partial
sequences from 20 angiosperm families used in this analysis,
20 sequences contain complete N-ter domain (Figure 2A and
Supplemental Figure 1A), nine lack the N-ter cap only (Interme-
diary molecule #1 in Figure 5A) and four lack both the N-ter cap
and the E5–10 domain (Intermediary #2 in Figure 5A; Figure 2
and Supplemental Figure 1B) but do not contain a VEF repeat.
So far, no sequence that lacks E5–10 but contains the N-ter cap
has been found, suggesting that the N-ter cap may need to be
lost first in order for the E5–10 domain to be lost. Finally, only
one VRN2-like sequence, Arabidopsis VRN2, possesses the C-
terminal repeat (Supplemental Figure 1G).
Based on the frequency of the intermediary forms and
results from phylogenetic analyses, we propose a three-step
hypothesis in the evolution of VRN2 from a parental EMF2 fol-
lowing gene duplication (Figure 5A). In the first step, EMF2
loses the N-ter cap, resulting in Intermediary molecule #1. This
could be achieved by mutation of the first ATG, rendering the
second ATG as a translation-starting site. In the second step,
Intermediary #1 loses the E5–10 domain, resulting in Inter-
mediary molecule #2. This could be achieved by mutation of
the splice sites within exon 5–10, resulting in exon skipping
(Hayashi et al., 1991). In the third step, Intermediary #2 gains
a C-terminal repeat, resulting in the backbone of VRN2. Cur-
rently, this third step has only been observed in Arabidopsis.
The importance of the 52-aa VEF repeat to the VRN2 function
remains to be tested, but the intermediate sequences may rep-
resent intermediate forms that could be in the process of evolv-
ing the VRN2 function. Comparison of structure and function
between these sequences and VRN2 will be required to better
understand the relationships of these genes.
The proposed process could happen sequentially, resulting
in independent derivations of a VRN2-like sequence from an
EMF2-like ancestor multiple times throughout plant evolution.
Convergence of the VEF domain among the VRN2-like sequen-
ces may occur concurrently with the losses of domains during
steps 1 and 2, or may occur following these structural changes
due to selection on the resulting gene sequence. This later case
assumes that independently evolved VRN2 sequences would
converge upon a particular function, with selection then act-
ing in a similar manner on the individual VEF domains. Studies
demonstrating the function of VRN2-like sequences in plants
in which they are found would be required to understand the
selection events leading to convergence of sequence data.
More complete genomic and taxonomic sampling focused
on VRN2-like sequences will enable us to test for possible dif-
ferences on selection of the VRN2 clade in comparison with
various recovered EMF2 clades.
The presence of the VEF repeat only in Arabidopsis VRN2
indicates that it may be a lineage-specific event. In this case,
the ancestral VRN2 in the most recent common ancestor of
Arabidopsis and Populus would not have had the VEF repeat,
and the repeat was subsequently gained in the lineage leading
to Arabidopsis after its divergence from the eudicot lineage
leading to Populus. Phylogenetic analysis showed that the
full-length Populus and Arabidopsis VRN2-like sequences are
in the same clade, despite the lack of the VEF repeat in
PtVRN2_4. However, in the analysis of the VEF domain alone,
the VEF of PtEMF2_4 remained in the same clade as that of
PtEMF2_1 and PtEMF2_2, suggesting stabilizing selection on
the VEF domain in Populus since the duplication event leading
to the Populus EMF2/VRN2-like divergence. This indicates that
overall domain architecture of the EMF2 gene is evolving in-
dependently from within-domain protein structure, at least
for the VEF domain. Studies investigating evidence for direc-
tional selection on the VEF domain following duplication of
EMF2 will be helpful to assess the likelihood of VRN2 evolution
following gene or genome duplication.
Phylogenetic analysis and sequence similarity comparison
clearly demonstrate that the VEF domain of VEF-L36 is more
closely related to that of VRN2 than to EMF2 (Table 3 and Fig-
ures 3 and 4). Similarly, both the C2H2 and VEF domains of FIS2
are more closely related to those of VRN2 than EMF2 (Table 3
and Figures 3 and 4). These findings support the derivation of
FIS2 and VEF-L36 from VRN2; only plants that have evolved
VRN2 could generate sequences like Arabidopsis FIS2 and
VEF-L36. FIS2 is an essential gene in Arabidopsis, but has
not yet been identified in other plants, including plants with
full genome sequences. FIS2 is specifically expressed in the ga-
metophyte of Arabidopsis and prevents endosperm develop-
ment prior to fertilization (Luo et al., 1999, 2000). A search
against cDNA libraries constructed from various angiosperm
flowers did not result in any FIS2-like homologs. In plants that
did not evolve VRN2, EMF2-like or alternative sequences may
have evolved to prevent endosperm development without fer-
tilization. Alternatively, genes with functional but without
750 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
sequence conservation (Calonje et al., 2008) may have evolved
to take the place of FIS2. The presence of FIS2 and VEF-L36
should be investigated across Brassicaceae and its sister family,
Capparaceae (Hall et al., 2002), in order to localize the poten-
tial duplication events leading to the evolution of these
sequences from a hypothetical VRN2-like ancestral sequence.
FIS2 may have diverged from a duplicated VRN2, while VEF-L36
may have evolved via a translocation of a VEF domain donated
by VRN2 (Figure 5B and 5C).
PRC2 components play important roles in animal develop-
ment, notably in insects and mammals (Schuettengruber
et al., 2007). Some animal VEF protein sequences in the data-
base possess all domains found in Su(z)12; others possess only
the VEF and C2H2, or only the VEF domain. Indeed, nematode
has a sequence that shares C2H2 and VEF domain with Su(z)12
(see GenBank’s protein databases). Protein sequence align-
ment based on identity/similarity did not identify any animal
protein with the VEF domain linked to FIS2’s S-rich or VEF-L36’s
L36 domain, despite the abundance of S-rich and L36 in nature.
A comprehensive evolutionary analysis of animal VEF-contain-
ing proteins is beyond the scope of the present study. How-
ever, gene duplication, domain deletion/insertion/
rearrangement apparently occurred during the evolution
of animal VEF proteins as well. For example, mouse has
one, chimpanzee has three, and zebra fish has two VEF pro-
tein homologs. Some animal homologs possess the N-termi-
nal sequence, while others do not; and some domains specific
to certain animals can be identified (data not shown). Gene
fusion involving the human VEF homolog would lead to neo-
plastic tumor growth (Li et al., 2008). Future investigation of
domain architectures in animal VEF proteins would provide
insights into the evolutionary trends of VEF proteins in plants
versus those in animals.
Dynamic Changes during VEF Gene Evolution
The evolution of the VEF genes in plants is characterized by
the mobility of the VEF domain, duplication, and functional
divergence of homologous sequences. In addition to its di-
verse location in the genome, a VEF domain can be located
in the N- or C-terminus within a genetic locus. A VEF do-
main-containing gene may even lose the VEF domain, as in
the case of HvEMF2_1. These phenomena indicate that the
VEF domain functions like a mobile functional module that
plays a major role in protein evolution, facilitated by intronic
recombination or exon shuffling (Patthy, 1996; Kolkman and
Stemmer, 2001).
The dynamic genetic changes that occurred during the evo-
lution of this small gene family caused varying degrees of di-
vergence in sequences located between the conserved
domains. For example, a region encoded by EMF2 exon 15
through exon 17 (E15–17), flanked by the highly conserved
C2H2 zinc finger and VEF domains, is a region with the lowest
conservation among the EMF2/VRN2 class homologs (Supple-
mental Figure 1D). While the ends use identical or similar
amino acids and have almost no length variation, the center
of the gene region encoded by exon 16 and the 5’ end of exon
17 requires indels for multiple sequence alignment represent-
ing up to 20–70 aa in length difference. The gradient in the
degree of similarity, from highly divergent at the center re-
gion to highly conserved at the 5’ and 3’ ends, may be infor-
mative in plant phylogenetics. Finally, we note that the VEF
gene tree reflected our best understanding of the organismal
tree for included taxa (Grass Phylogeny Working Group,
2001). Regions with high levels of variability combined with
low copy number may render EMF2, particularly the E15–17
domain, a useful phylogenetic tool for evaluating the evolu-
tionary relationships of plants across both deep and shallow
nodes.
METHODS
Identification of Sequences and Domains of VEF Genes
across Land Plants
Full-length EMF2 putative protein sequence was used to BLAST
(Basic Local Alignment Search Tool) search against the fol-
lowing databases: GenBank (www.ncbi.nih.gov/), TIGR/JCVI
(www.tigr.org/), the Floral Genome Project (http://fgp.bio.
psu.edu/fgp/), Plant Genome DataBase (www.plantgdb.org/),
the moss genome (www.cosmoss.org/, http://genome.jgi-
psf.org/Phypa1_1/Phypa1_1.home.html), the papaya genome
(http://tinyurl.com/3ua95v), the pine EST database (http://fungen.
botany.uga.edu/), the Plant Genome Network (http://pgn.cornell.
edu/cgi-bin/blast/blast_search.pl), Brassica (http://ukcrop.net/),
SOL Genomics Network (www.sgn.cornell.edu/), the poplar
genome (http://genome.jgi-psf.org), the Chromatin database
(www.chromdb.org/), and the Selaginella genome (http://
selaginella.genomics.purdue.edu). Sequences with an e-value
greater than 0.001 (non-significant homology) were elimi-
nated, thereby eliminating all non-plant sequences. Plant
sequences containing intact EMF2-like N-terminal, C2H2, and/
or C-terminal domains were selected for further analysis. For
identification of homologs of FIS2’s S-rich domain and VEF-
L36’s L36 domain, S-rich domain and L36 domain amino acid
sequences were used to BLAST search against the same data-
bases listed above with an e-value cut-off of 0.001.
Sequencing EMF2 Homolog cDNA
Plasmid cDNAs were extracted from bacteria culture according
to the manufacture’s protocols (QIAGEN Inc. Valencia, CA
91355, USA). M13 rev (5#-GGAAACAGCTATGACCATG-3’) and
M13 (–20) (5#-GTAAAACGACGGCCAG-3’) primers were used
for sequencing, with the following internal primers used
as necessary to obtain full sequences: Acorus: 5#-CTCAG-
TAGAGCATGTCTGCTG-3#, 5#-CCCATGCAATCGTGAGAATGC-3#,
5#-TGACACGCTGAAAGATGATG-3#, 5#-CATTAACTGCCTGATA-
CTCTTC-3#, Asparagus: 5#-CAATACGGAATCCATCATTTCTGC-
3#, 5#-CTTGCTCCAATGCCATTGGC-3’; Nuphar: 5#-GATGAGGTC-
GATGATGATATTGC-3#, 5#-CTGCCAAAACCCGCTGTTTC-3’; Yucca:
5#-GTCAATCGGGCATGTATACTG-3#, 5#-CTTGCTCCAACGCCATTG-
GC-3’; Eschscholzia 8.1: 5#-GCTGATTACAAGGAACAGACTG-3#,
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 751
5#-CACGGAACATGACCATCTGC-3’;Eschscholzia8.2:5#-GAGGAAT-
GACAGGGTGGAAGC-3#, 5#-GTTCCAGAGATGCATAATCCTTG-3’;
Tomato: 5#-GCTTTGCCGAACTTGCCAG-3#, 5#-CCCTATGAGAATG-
AAAGAATTGCC-3#.
Sequence Alignment
T-coffee (www.ebi.ac.uk/t-coffee/) was used to produce
a global amino acid alignment using the default values for pro-
tein alignment. RADAR (www.ebi.ac.uk/Radar/) was used to
detect de novo repeat regions in EMF2 homologous sequen-
ces. Classification of VEFs subgroups was performed based
on domain organization in the aligned sequences.
The full-length VEF homologs were aligned using T-coffee
and pair-wise distance scores were calculated with ClustalW
(version 1.83, http://www.ebi.ac.uk/Tools/Radar/) as the num-
ber of identities in the best alignment divided by the number
of residues compared (gap positions excluded). Scores were
initially calculated as percent identity scores and were con-
verted to distances by dividing by 100 and subtracting from
1.0 to give total number of differences per site. No correction
for multiple substitutions was performed.
Phylogenetic Analysis
EMF2/VRN2 Full-Length Sequence Data
The T-coffee alignment was used for phylogenetic analysis
based on its superior prediction of primary homology state-
ments as compared with prior knowledge of functional domain
architecture; for example, the N-terminus-located C2H2 do-
main of FIS2 aligned with the EMF2 N-terminal domain when
using MegAlign or ClustalW, while, in T-coffee, the annotated
C2H2 domains aligned with one another across all sequences.
Bayesian phylogenetic analyses on aligned full-length
sequences were performed with MrBayes v. 3.1.2 (Huelsenbeck
and Ronquist, 2001; Ronquist and Huelsenbeck, 2003). The
model of protein evolution that best fit the protein sequence
data was selected using the AIC as implemented in ProtTest 2.0
(Abascal et al., 2005—see e-mail for citation). The best-scoring
model for the EMF2/VRN2 full-length alignment was the
Jones-Taylor-Thornton (JTT) probability model (Jones et al.,
1992), with rate variation among sites calculated as a gamma
distribution (+G), and global rearrangements were sampled
with a random order of input sequences. Posterior probabili-
ties of the generated trees were approximated using an MCMC
algorithm with four incrementally heated chains (T = 0.2) for
5 000 000 generations and sampling trees every 100 genera-
tions. Two independent runs were conducted for each dataset
simultaneously, the default setting in MrBayes v. 3.1.2. Follow-
ing completion, the sampled trees from each analysis were
plotted against their log-likelihood score to identify the point
at which log-likelihood scores reached a maximum value. All
trees prior to this point were discarded as the burn-in phase,
all post-burn-in trees from each run were pooled, and a 50%
majority-rule consensus tree was calculated to obtain a topol-
ogy with average branch lengths as well as posterior probabil-
ities as indicators of support for all resolved nodes.
VEF Domain Sequence Data
The VEF domain, a region held in common by EMF2, VRN2,
FIS2, and VEF-L36, was used to estimate the phylogenetic rela-
tionships among VEF gene sequences across land plants. Pro-
tein alignment of the VEF domain was performed with
MUSCLE, resulting in a multiple sequence alignment of about
130 aa. ProtTest 2.0 was also used to determine the model of
evolution that best fits the VEF domain alignment. The best-
scoring model for the VEF alignment was also JTT +G, and
global rearrangements were sampled with a random order
of input sequences. Bayesian and Maximum likelihood meth-
ods of phylogenetic inference were conducted on the VEF
domain alignment using MrBayes (tree not shown) and RAxML-
VI-HPC (Stamatakis, 2006), respectively. The analyses were per-
formed on the computer cluster of the Cyber-Infrastructure
for Phylogenetic Research project (CIPRES, www.phylo.org)
at the San Diego Supercomputer Center. Clade support, which
was assessed with nonparametric bootstrapping (Felsenstein,
1985) as implemented in RAxML-VI-HPC, was based on 100 rep-
licates. The tree with the highest log-likelihood score from the
RAxML analysis was chosen for representation here.
Accession Numbers
Novel full-length protein sequences generated for this study
were deposited in GenBank with the following accession num-
bers:YuccafilamentosaEMF2(YfEMF2,GenBankaccessionnum-
ber(acc.#)ABD85300);AsparagusofficinalisEMF2(AoEMF2,acc.
# ABD85301); Eschscholzia californica EMF2 (EcEMF2_2, acc. #
ABD98791); Eschscholzia californica EMF2 (EcEMF2_1, acc. #
ABD98790); Tomato EMF2 (LeEMF2_1, acc. # ABI99480); Acorus
americanus EMF2 (AaEMF2, acc. # ABI99481).
SUPPLEMENTARY DATA
Supplementary Data are available at Molecular Plant Online.
FUNDING
This work is supported by NSF grant #IBN 0236399 and USDA grant
#03–35301–13244 to Z.R.S.
ACKNOWLEDGMENTS
The authors thank Dr Hong Ma (Pennsylvania State University), the
Floral Genome Project, the and the SOL Genomics Network
(www.sgn.cornell.edu/) for providing EMF2 homologous cDNA
clones, Kazusa DNA Research Institute for providing Lotus japonica
EMF2 sequence to Dr Rieko Nishimura, Dr Jo Ann Banks (National
Science Foundation/Purdue University) for providing Selaginella
EMF2 homologous EST sequences, Dr Ralph Quatrano (Washington
University) for providing access to the Physcomitrella website, Drs
Hong Ma and Damon R. Lisch (UC Berkeley) for comments of the
manuscript, Steve Ruzin and Denise Schichnes (Bioimaging Facility,
CNR, UC Berkeley) for image processing, and our laboratory mem-
bers Myriam Calonje, Tiffany Tirtadinata, Robert Luan, Heather
752 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes
Driscoll, and Rosario Sanchez for help and support in preparation of
this work. No conflict of interest declared.
REFERENCES
Abascal, F., Zardoya, R., and Posada, D. (2005). ProtTest: selection of
best-fit models of protein evolution. Bioinformatics. 21,
2104–2105.
Birve, A., Sengupta, A.K., Beuchle, D., Larsson, J., Kennison, J.A.,
Rasmuson-Lestander, A., and Muller, J. (2001). Su(z)12, a novel
Drosophila Polycomb group gene that is conserved in verte-
brates and plants. Development. 128, 3371–3379.
Calonje, M., and Sung, Z.R. (2006). Complexity beneath the silence.
Curr. Opin. Plant Biol. 9, 530–537.
Calonje, M., Sanchez, R., Chen, L., and Sung, Z.R. (2008). EMBRY-
ONIC FLOWER1 participates in Polycomb group-mediated AG
gene silencing in Arabidopsis. Plant Cell. 20, 277–291.
Cao, R., Wang, L., Wang, H., Xia, L., Erdjument-Bromage, H.,
Tempst, P., Jones, R.S., and Zhang, Y. (2002). Role of histone
H3 lysine 27 methylation in Polycomb-group silencing. Science.
298, 1039–1043.
Chanvivattana, Y., Bishopp, A., Schubert, D., Stock, C., Moon, Y.H.,
Sung, Z.R., and Goodrich, J. (2004). Interaction of Polycomb-
group proteins controlling flowering in Arabidopsis. Develop-
ment. 131, 5263–5276.
Czermin, B., Melfi, R., McCabe, D., Seitz, V., Imhof, A., and
Pirrotta, V. (2002). Drosophila enhancer of Zeste/ESC complexes
have a histone H3 methyltransferase activity that marks chromo-
somal Polycomb sites. Cell. 111, 185–196.
Englbrecht, C.C., Schoof, H., and Bohm, S. (2004). Conservation, di-
versification and expansion of C2H2 zinc finger proteins in the
Arabidopsis thaliana genome. BMC Genomics. 5, 39.
Felsenstein, J. (1985). Confidence limits on phylogenies: an ap-
proach using the bootstrap. Evolution. 39, 783–791.
Gendall, A.R., Levy, Y.Y., Wilson, A., and Dean, C. (2001). The VER-
NALIZATION 2 gene mediates the epigenetic regulation of ver-
nalization in Arabidopsis. Cell. 107, 525–535.
Goodrich, J., Puangsomlee, P., Martin, M., Long, D.,
Meyerowitz, E.M., and Coupland, G. (1997). A Polycomb-group
gene regulates homeotic gene expression in Arabidopsis. Na-
ture. 386, 44–51.
Grass Phylogeny Working Group (Nigel P. Barker, Lynn G. Clark,
Jerrold I. Davis, Melvin R. Duvall, Gerald F. Guala, Catherine
Hsiao, Elizabeth A. Kellogg, and H. Peter Linder) (2001). Phylog-
eny and subfamilial classification of the grasses (Poaceae).
Annals of the Missouri Botanical Garden. 88, 373–457.
Grossniklaus, U., Vielle-Calzada, J.P., Hoeppner, M.A., and
Gagliano, W.B. (1998). Maternal control of embryogenesis by
MEDEA, a polycomb group gene in Arabidopsis. Science. 280,
446–450.
Hall, J.C., Sytsma, K.J., and Iltis, H.H. (2002). Phylogeny of Cappar-
aceae and Brassicaceae based on chloroplast sequence data.
Amer. J. Bot. 89, 1826–1842.
Hayashi, S.I., Kunisada, T., Ogawa, M., Yamaguchi, K., and
Nishikawa, S.I. (1991). Exon skipping by mutation of an authen-
tic splice site of c-kit gene in W/W mouse. Nucleic Acids Res. 19,
1267–1271.
Hennig, L., Taranto, P.,Walser,M., Schonrock, N., and Gruissem,W.
(2003). Arabidopsis MSI1 is required for epigenetic mainte-
nance of reproductive development. Development. 130,
2555–2565.
Huelsenbeck, J.P., and Ronquist, F. (2001). BRBAYES: Baysian infer-
ence of phylogenetic trees. Bioinformatics. 17, 754–755.
Irish, V.F., and Benfey, P.N. (2004). BeyondArabidopsis: translational
biology meets evolutionary developmental biology. Plant Phys-
iol. 135, 611–614.
Jiang, D., Wang, Y., Wang, Y., and He, Y.l (2008). Repression of
Flowering Locus C and Flowering Locus T by the Arabidopsis
Polycomb Repressive Complex 2 components. PLoS One. 3,
e3404.
Jones, D.T., Taylor, W.R., and Thornton, J.M. (1992). The rapid gen-
eration of mutation data matrices from protein sequences. Com-
put. Appl. Biosci. 8, 275–282.
Kim, S., Yoo, M., Albert, V., Farris, J., Soltis, P.S., and Soltis, D.E.
(2004). Phylogeny and diversification of B-function MADS-box
genes in angiosperms: evolutionary and functional implications
of a 260-million-year-old duplication. Amer. J. Bot. 91, 2102–2118.
Kinoshita, T., Harada, J.J., Goldberg, R.B., and Fischer, R.L. (2001).
Polycomb repression of flowering during early plant develop-
ment. Proc. Natl Acad. Sci. U S A. 98, 14156–14161.
Kohler, C., Hennig, L., Spillane, C., Pien, S., Gruissem, W., and
Grossniklaus, U. (2003). The Polycomb-group protein MEDEA
regulates seed development by controlling expression of the
MADS-box gene PHERES1. Genes Dev. 17, 1540–1553.
Kolkman, J.A., and Stemmer,W.P. (2001). Directed evolution of pro-
teins by exon shuffling. Nat. Biotechnol. 19, 423–428.
Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H., Tempst, P.,
and Reinberg, D. (2002). Histone methyltransferase activity asso-
ciated with a human multiprotein complex containing the En-
hancer of Zeste protein. Genes Dev. 16, 2893–2905.
Li, J., Wang, J., Mor, G., and Sklar, J. (2008). A neoplastic gene fusion
mimics trans-splicing of RNAs in normal human cells. Science.
321, 1357–1361.
Luo, M., Bilodeau, P., Koltunow, A., Dennis, E.S., Peacock, W.J., and
Chaudhury, A.M. (1999). Genes controlling fertilization-
independent seed development in Arabidopsis thaliana. Proc.
Natl Acad. Sci. U S A. 96, 296–301.
Luo, M., Bilodeau, P., Dennis, E.S., Peacock,W.J., and Chaudhury, A.
(2000). Expression and parent-of origin effects for FIS2, MEA, and
FIE in the endosperm and embryo of developing Arabidopsis
seeds. Proc Natl Acad Sci. 97, 10637–10642.
Moon, Y.H., Chen, L., Pan, R.L., Chang, H.S., Zhu, T., Maffeo, D.M.,
and Sung, Z.R. (2003). EMF genes maintain vegetative develop-
ment by repressing the flower program in Arabidopsis. Plant
Cell. 15, 681–693.
Muller, J., Hart, C.M., Francis, N.J., Vargas, M.L., Sengupta, A.,
Wild, B., Miller, E.L., O’Connor, M.B., Kingston, R.E., and
Simon, J.A. (2002). Histone methyltransferase activity of a
Drosophila Polycomb group repressor complex. Cell. 111,
197–208.
Ohad, N., Yadegari, R., Margossian, L., Hannon, M., Michaeli, D.,
Harada, J.J., Goldberg, R.B., and Fischer, R.L. (1999). Mutations
in FIE, a WD Polycomb group gene, allow endosperm develop-
ment without fertilization. Plant Cell. 11, 407–416.
Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes | 753
Patthy, L. (1996). Exon shuffling and other ways of module ex-
change. Matrix Biol. 15, 301–310; discussion 311–302.
Ronquist, F., and Huelsenbeck, J.P. (2003). MrBayes 3: Bayesian phy-
logenetic inference under mixed models. Bioinformatics. 19,
1572–1574.
Schonrock, N., Bouveret, R., Leroy, O., Borghi, L., Kohler, C.,
Gruissem, W., and Hennig, L. (2006). Polycomb-group proteins
repress the floral activator AGL19 in the FLC-independent vernal-
ization pathway. Genes Dev. 20, 1667–1678.
Schuettengruber, B., Chourrout, D., Vervoort, M., Leblanc, B., and
Cavalli, G. (2007). Genome regulation by Polycomb and Trithorax
proteins. Cell. 128, 735–745.
Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based
phylogenetic analyses with thousands of taxa and mixed models.
Bioinformatics. 22, 2688–2690.
Sung, S., and Amasino, R.M. (2004). Vernalization and epigenetics:
how plants remember winter. Curr. Opin. Plant Biol. 7, 4–10.
Wood, C.C., Robertson, M., Tanner, G., Peacock, W.J., Dennis, E.S.,
and Helliwell, C.A. (2006). The Arabidopsis thaliana vernaliza-
tion response requires a Polycomb-like protein complex that also
includes VERNALIZATION INSENSITIVE 3. PNAS. 103,
14631–14636.
Yan, L., Loukoianov, A., Blech, A., Tranquilli, G., Ramakrish, W.,
SanMiguel, P., Bennetzen, J., Echenique, v, and Dubcovsky, J.
(2004). The Wheat VRN2 gene is a flowering repressor down-
regulated by vernalization. Science. 303, 1640–1644.
Yang, C.H., Chen, L.J., and Sung, Z.R. (1995). Genetic regulation of
shoot development in Arabidopsis: role of the EMF genes. Devel-
opmental Biol. 169, 421–435.
Yoshida, N., Yanai, Y., Chen, L., Kato, Y., Hiratsuka, J., Miwa, T.,
Sung, Z.R., and Takahashi, S. (2001). EMBRYONIC FLOWER2,
a novel Polycomb group protein homolog, mediates shoot de-
velopment and flowering in Arabidopsis. Plant Cell. 13,
2471–2481.
754 | Chen et al. d Molecular Evolution of VEF-Domain-Containing PcG Genes