more than 80r2r3-myb regulatory genes in the genome of arabidopsis thaliana

12
The Plant Journal (1998) 14(3), 273–284 More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana I. Romero 1 , A. Fuertes 1 , M. J. Benito 1 , J. M. Malpica 2 , A. Leyva 1 and J. Paz-Ares 1, * 1 Centro Nacional de Biotecnologı ´ a-CSIC, Campus de Cantoblanco, 28049-Madrid, Spain, and 2 Instituto Nacional de Investigaciones Agrarias, ctra. de La Corun ˜ a, Km. 7,528040-Madrid, Spain Summary Transcription factors belonging to the R2R3-MYB family contain the related helix-turn-helix repeats R2 and R3. The authors isolated partial cDNA and/or genomic clones of 78 R2R3-MYB genes from Arabidopsis thaliana and found accessions corresponding to 31 Arabidopsis genes of this class in databanks, seven of which were not represented in the authors’ collection. Therefore, there are at least 85, and probably more than 100, R2R3-MYB genes present in the Arabidopsis thaliana genome, representing the largest regulatory gene family currently known in plants. In contrast, no more than three R2R3- MYB genes have been reported in any organism from other phyla. DNA-binding studies showed that there are differences but also frequent overlaps in binding specificity among plant R2R3-MYB proteins, in line with the distinct but often related functions that are beginning to be recognized for these proteins. This large-sized gene family may contribute to the regulatory flexibility underlying the developmental and metabolic plasticity displayed by plants. Introduction Transcription factors play a central role in the regulation of developmental and metabolic programs. Despite the large differences in these programs, existing among organisms from different eukaryotic phyla, their transcrip- tion factors are quite conserved and most of them can be grouped into a few families according to the structural features of the DNA-binding domain they contain. One of these families is that of the R2R3-MYB proteins, whose complexity in plants is addressed in this study. The prototype of this family is the product of the animal c-MYB proto-oncogene, whose DNA-binding domain consists of three related helix-turn-helix motifs of about 50 amino acid residues, the so-called R1, R2 and Received 18 August 1997; revised 26 January 1998; accepted 28 January 1998. *For correspondence (fax 133 41585 4506; e-mail [email protected]). © 1998 Blackwell Science Ltd 273 R3 repeats. The repeat most proximal to the N-terminus (R1) does not affect DNA-binding specificity and is missing in oncogenic variants of c-MYB, such as v-MYB, and in the known plant R2R3-MYB proteins (Graf, 1992; Lipsick, 1996; Lu ¨ scher and Eisenman, 1990; Martin and Paz-Ares, 1997; Thompson and Ramsay, 1995). R2R3- MYB proteins belong to the MYB superfamily, which also includes proteins with two or three more distantly related repeats (e.g. of the R1/2 type, the progenitor of the R1 and R2 repeats), and proteins with one repeat, either of the R1/2 type (Feldbru ¨ gge et al., 1997) or of the R3 type (Bilaud et al., 1996; Kirik and Ba ¨ umlein, 1996). Genes of the MYB superfamily have been found in all eukaryotic organisms in which their presence has been investigated. However, the R2R3-type is not present in Saccharomyces cerevisiae and only 1–3 copies of R2R3- MYB genes per haploid genome have been described in organisms from protists and animals (Graf, 1992; Lipsick, 1996; Lu ¨ scher and Eisenman, 1990; Thompson and Ramsay, 1995). In contrast, preliminary evidence suggest that plants contain a much larger number of these genes (Avila et al., 1993; Jackson et al., 1991; Marocco et al., 1989; Oppenheimer et al., 1991). Little is known about the function of most plant R2R3- MYB genes although, in those few cases in which functions are known, these are different from those of their animal counterparts, which are mostly associated with the control of cell proliferation, prevention of apoptosis, and commit- ment to development (Graf, 1992; Lipsick, 1996; Lu ¨ scher and Eisenman, 1990; Martin and Paz-Ares, 1997; Taylor et al., 1996; Thompson and Ramsay, 1995; Toscani et al., 1997). Thus, most members of the plant R2R3-MYB family with known functions have been implicated in the regula- tion of the synthesis of different phenylpropanoids (Cone et al., 1993; Franken et al., 1994; Grotewold et al., 1994; Moyano et al., 1996; Paz-Ares et al., 1987; Quattrocchio et al., 1993; Quattrocchio, 1994; Sablowski et al., 1994; Solano et al., 1995a). Phenylpropanoids are a large class of chemically different metabolites originating from phenylalanine, which includes flavonoids, coumarins and cinnamyl alcohols among others (Hahlbrock and Scheel, 1989). Despite their chemical diversity, these compounds are biosynthetically related as their synthesis does include common enzymatic steps. Other functions associated with members of the plant R2R3-MYB gene family include the control of cell differentiation (Noda et al., 1994; Oppenheimer et al., 1991) and the mediation of responses to signalling molecules such as salicylic acid and the phytohormones abscisic acid (ABA) and giberellic acid

Upload: romero

Post on 06-Jul-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

The Plant Journal (1998) 14(3), 273–284

More than 80R2R3-MYB regulatory genes in the genome ofArabidopsis thaliana

I. Romero1, A. Fuertes1, M. J. Benito1, J. M. Malpica2,

A. Leyva1 and J. Paz-Ares1,*1Centro Nacional de Biotecnologıa-CSIC, Campus de

Cantoblanco, 28049-Madrid, Spain, and2Instituto Nacional de Investigaciones Agrarias, ctra. de

La Coruna, Km. 7,528040-Madrid, Spain

Summary

Transcription factors belonging to the R2R3-MYB family

contain the related helix-turn-helix repeats R2 and R3. The

authors isolated partial cDNA and/or genomic clones of

78 R2R3-MYB genes from Arabidopsis thaliana and found

accessions corresponding to 31 Arabidopsis genes of

this class in databanks, seven of which were not

represented in the authors’ collection. Therefore, there

are at least 85, and probably more than 100, R2R3-MYB

genes present in the Arabidopsis thaliana genome,

representing the largest regulatory gene family currently

known in plants. In contrast, no more than three R2R3-

MYB genes have been reported in any organism from

other phyla. DNA-binding studies showed that there

are differences but also frequent overlaps in binding

specificity among plant R2R3-MYB proteins, in line with

the distinct but often related functions that are beginning

to be recognized for these proteins. This large-sized

gene family may contribute to the regulatory flexibility

underlying the developmental and metabolic plasticity

displayed by plants.

Introduction

Transcription factors play a central role in the regulation

of developmental and metabolic programs. Despite the

large differences in these programs, existing among

organisms from different eukaryotic phyla, their transcrip-

tion factors are quite conserved and most of them can

be grouped into a few families according to the structural

features of the DNA-binding domain they contain. One

of these families is that of the R2R3-MYB proteins, whose

complexity in plants is addressed in this study.

The prototype of this family is the product of the

animal c-MYB proto-oncogene, whose DNA-binding

domain consists of three related helix-turn-helix motifs of

about 50 amino acid residues, the so-called R1, R2 and

Received 18 August 1997; revised 26 January 1998; accepted 28 January

1998.

*For correspondence (fax 133 41585 4506; e-mail [email protected]).

© 1998 Blackwell Science Ltd 273

R3 repeats. The repeat most proximal to the N-terminus

(R1) does not affect DNA-binding specificity and is

missing in oncogenic variants of c-MYB, such as v-MYB,

and in the known plant R2R3-MYB proteins (Graf, 1992;

Lipsick, 1996; Luscher and Eisenman, 1990; Martin and

Paz-Ares, 1997; Thompson and Ramsay, 1995). R2R3-

MYB proteins belong to the MYB superfamily, which

also includes proteins with two or three more distantly

related repeats (e.g. of the R1/2 type, the progenitor of

the R1 and R2 repeats), and proteins with one repeat,

either of the R1/2 type (Feldbrugge et al., 1997) or of the

R3 type (Bilaud et al., 1996; Kirik and Baumlein, 1996).

Genes of the MYB superfamily have been found in all

eukaryotic organisms in which their presence has been

investigated. However, the R2R3-type is not present in

Saccharomyces cerevisiae and only 1–3 copies of R2R3-

MYB genes per haploid genome have been described in

organisms from protists and animals (Graf, 1992; Lipsick,

1996; Luscher and Eisenman, 1990; Thompson and Ramsay,

1995). In contrast, preliminary evidence suggest that

plants contain a much larger number of these genes (Avila

et al., 1993; Jackson et al., 1991; Marocco et al., 1989;

Oppenheimer et al., 1991).

Little is known about the function of most plant R2R3-

MYB genes although, in those few cases in which functions

are known, these are different from those of their animal

counterparts, which are mostly associated with the control

of cell proliferation, prevention of apoptosis, and commit-

ment to development (Graf, 1992; Lipsick, 1996; Luscher

and Eisenman, 1990; Martin and Paz-Ares, 1997; Taylor

et al., 1996; Thompson and Ramsay, 1995; Toscani et al.,

1997). Thus, most members of the plant R2R3-MYB family

with known functions have been implicated in the regula-

tion of the synthesis of different phenylpropanoids (Cone

et al., 1993; Franken et al., 1994; Grotewold et al., 1994;

Moyano et al., 1996; Paz-Ares et al., 1987; Quattrocchio

et al., 1993; Quattrocchio, 1994; Sablowski et al., 1994;

Solano et al., 1995a). Phenylpropanoids are a large class

of chemically different metabolites originating from

phenylalanine, which includes flavonoids, coumarins and

cinnamyl alcohols among others (Hahlbrock and Scheel,

1989). Despite their chemical diversity, these compounds

are biosynthetically related as their synthesis does include

common enzymatic steps. Other functions associated with

members of the plant R2R3-MYB gene family include

the control of cell differentiation (Noda et al., 1994;

Oppenheimer et al., 1991) and the mediation of responses

to signalling molecules such as salicylic acid and the

phytohormones abscisic acid (ABA) and giberellic acid

Page 2: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

274 I. Romero et al.

(GA) (Gubler et al., 1995; Urao et al., 1993; Yang and

Klessig, 1996).

Sequence specific DNA-binding has been demonstrated

for several R2R3-MYB proteins, in agreement with their

role in transcriptional control (Biedenkapp et al., 1988;

Grotewold et al., 1994; Gubler et al., 1995; Howe and

Watson, 1991; Li and Parish, 1995; Moyano et al., 1996;

Sablowski et al., 1994; Sainz et al., 1997; Solano et al.,

1995a; Solano et al., 1997; Stober-Grasser et al., 1992; Urao

et al., 1993; Watson et al., 1993; Yang and Klessig, 1996).

The information available indicates that these proteins

bind to one or more of the following types of site: I,

CNGTTR; II, GKTWGTTR; and IIG, GKTWGGTR (where N

indicates A, G, C or T; K, G or T; R, A or G; W, A or T).

For instance, animal R2R3-MYB proteins recognize type I

sequences (Biedenkapp et al., 1988; Howe and Watson,

1991; Stober-Grasser et al., 1992; Watson et al., 1993), the

ZmMYBP (also known as P) proteins bind to type IIG

sequences, the ZmMYBC1 (also known as C1) and

AmMYB305 proteins bind to both type II and type IIG, and

the PhMYB3 protein can bind to types I and II (Grotewold

et al., 1994; Sablowski et al., 1994; Sainz et al., 1997; Solano

et al., 1995a; Solano et al., 1997). Recent studies with

protein PhMYB3 from Petunia, including molecular

modelling based on the solved structure of the mouse c-

MYB protein (MmMYB), have highlighted the importance of

residues Lys67, Leu71, Lys121 and Asn122 in determining

recognition specificity (Ogata et al., 1994; Solano et al.,

1997). These residues are fully conserved in all known

plant R2R3-MYB proteins. In contrast, protein AtMYBCDC5,

which has two R1/2-type repeats and does not

conserve these residues, has a completely different speci-

ficity (CTCAGCG, Hirayama and Shinokazi, 1996).

To evaluate the number of R2R3-MYB genes in plants,

and as a first step towards determining the full range of

functions associated with these genes using a reverse

genetic approach, we have carried out a PCR-based

systematic search for R2R3-MYB genes in the model

species Arabidopsis thaliana. We estimate that it contains

at least 85, and probably more than 100 R2R3-MYB genes,

representing the largest gene family of regulatory genes

described thus far in any plant species. In addition, we have

investigated the DNA-binding specificity of representative

R2R3-MYB proteins and have shown that there may be

differences but also considerable similarities in binding

specificitiy between R2R3-MYB proteins, particularly

among members of the same phylogenetic group, which is

in agreement with the recognizable functional relationships

between the members of the R2R3-MYB family.

Results

Isolation of R2R3-MYB clones

All known plant R2R3-MYB proteins contain highly con-

served stretches of amino acid residues within the

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

Figure 1. Consensus amino acid sequence of the two repeats comprising

the DNA binding domain of plant R2R3-MYB proteins as described by

Avila et al. (1993), and oligonucleotide mixtures used in the isolation of

the R2R3-MYB genes (N1–6 and C1–3).

Upper case indicates residues fully conserved in all proteins used to

derive the consensus. Lower case indicates residues identical in at least

80% of the proteins. Other symbols are: 1, basic amino acid; –, acidic

amino acid; #, hydrophobic amino acid. New sequences (published since

this alignment, see Figure 2) have not altered this consensus sequence

in the regions from which the oligonucleotide sequences were derived,

with the exception of PhMYBAn2 which has a D/A substitution in the

region corresponding to oligonucleotide mixtures C1-C3, although they

have increased the variability of residues in variable positions. This

variability was taken into account in the design of the oligonucleotide

mixtures (R 5 A 1 G, Y 5 C 1 T, S 5 G 1 C, D 5 A 1 G 1 T, N 5

A 1 G 1 C 1 T) and so the oligonucleotide mixtures should have

recognized all the more recent additions to the R2R3 MYB gene family.

recognition helices of the R2 and R3 repeats from which

R2R3-MYB-specific mixtures of oligonucleotides can be

derived (Avila et al., 1993; Figure 1). These oligonucleotide

mixtures do not recognize the AtMYBCDC5 gene encoding

a MYB protein with two highly divergent repeats of the

R1/2-type (Hirayama and Shinokazi, 1996; Lipsick, 1996).

To search for R2R3-MYB genes, we first prepared cDNA

and genomic DNA libraries (of 1000 and 3000 clones,

respectively) enriched in these genes using PCR with

R2R3-MYB-specific oligonucleotides. Sequencing of all the

different clones present in each of these libraries (for

details, see Experimental procedures), revealed that 36 and

74 different R2R3-MYB genes were represented in the

cDNA and genomic DNA libraries, respectively, and that

32 were represented in both libraries. A total of 78 different

R2R3-MYB genes were therefore represented in our collec-

tion. A computer search revealed that there were 31 R2R3-

MYB genes from Arabidopsis described in databanks, of

which seven were not represented in the set of 78 isolated

in this study. There are, therefore, at least 85 (78 1 7), and

probably more than 100 (78 3 31/24, see Experimental

procedures) R2R3-MYB genes in the Arabidopsis thaliana

genome.

More than half of the R2R3-MYB genes identified in this

study were characterized only at the genomic DNA level,

raising the possibility that many of these R2R3-MYB

Page 3: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

The R2R3-MYB gene family in Arabidopsis 275

genomic sequences might represent pseudogenes rather

than active genes. However, in no case was the reading

frame of the exonic sequences (represented in the genomic

clones) prematurely terminated. In addition, the number

of fully conserved residues in plant R2R3-MYB proteins is

the same independently of whether those protein

sequences from the R2R3-MYB genes characterized only

at the genomic DNA level are considered in the estimation.

On the other hand, pseudogenes usually show higher

rates of non-synonymous substitutions (Kns) relative to

synonymous substitutions (Ks) than active genes (Satta,

1993). We calculated the Kns/Ks ratio for all possible pairs

of R2R3-MYB genes in this population and these ratios

were compared to those in the population of R2R3-MYB

genes known to be expressed (i.e. those for which a cDNA

clone was available), using the method of Nei and Gojobori

(1986). The Kns/Ks values in the two populations (Kns/Ks

in genomic DNA population: 0.393 6 0.016; Kns/Ks in cDNA

population: 0.392 6 0.115) were not significantly different

in a t-test (P 5 0.83 ù 0.10). Collectively, these data are in

agreement with the conclusion that most, if not all, plant

R2R3-MYB sequences represent active genes.

Phylogenetic analysis of R2R3-MYB proteins

A phylogram of R2R3-MYB proteins was constructed with

the neighbor-joining method (Saitou and Nei, 1987) using

the sequences of the proteins in Figure 2 (except HvMYB33,

LeMYB1, AtMYB67, AtMYB41 and AtMYB45; Figure 3).

Three major groups were distinguished in the phylogram,

A, B and C (Figure 3). The bootstrap support for the node

corresponding to group C was not very high (30%), perhaps

due to the short size of the sequences used. However,

when the analysis was made using the whole R2R3-MYB

domain from the proteins for which this sequence was

available, the bootstrap support of this node was more

than 75% (see Figure 3). In addition, the existence of the

three groups was also supported by the tree constructed

using parsimony (Eck and Dayhoff, 1966) (not shown) and

by the different intron/exon structure of the genes encoding

the proteins of each group, with the exception of AtMYB67

(see Figure 3). Group A (accounting for about 10% of the

A. thaliana proteins), which also includes the animal and

protist R2R3-MYB proteins, represents genes with no intron

in the region sequenced, with the exception of AtMYB1

which has an intron at position 1. Group B (5% of the A.

thaliana proteins) represents proteins encoded by genes

with an intron at position 3. Finally, group C (85% of A.

thaliana proteins) contains genes with an intron at position

2. As shown below (see Discussion), this classification is

also in agreement with the data on DNA-binding specificity

of R2R3-MYB proteins, as similarities in this property were

usually higher between proteins belonging to the same

group than between proteins belonging to different groups.

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

Each group, particularly group C, can be further subdivided

into subgroups of more closely related members. Many of

these subgroups contain R2R3-MYB proteins from other

plant species (although the search for this type of MYB

genes in these species has not been exhaustive), consistent

with the high functional similarity of regulatory systems

among plants (Benfey and Chua, 1989).

DNA-binding specificity of representative R2R3-MYB

proteins

To evaluate the degree of similarity in DNA binding

specificity between different Arabidopsis R2R3-MYB

proteins, we isolated cDNA clones containing the entire

coding region of four representative R2R3-MYB proteins,

AtMYB15, AtMYB77, AtMYB84 and AtMYBGl1 (see

Methods). Full length and deletion derivatives of these

proteins were produced by in vitro transcription and

translation. To determine their DNA-binding specificity,

an EMSA (electrophoretic mobility shift assay)-based

random-site selection procedure was used (Blackwell and

Weintraub, 1990; Solano et al., 1995a). Selection experi-

ments were performed with two oligonucleotide mixtures,

OI and OII, which had a partially random core sequence

representing the three types of sites defined for R2R3-MYB

proteins: OI, type I; OII, types II and IIG (Biedenkapp et al.,

1988; Grotewold et al., 1994; Gubler et al., 1995; Howe and

Watson, 1991; Li and Parish, 1995; Moyano et al., 1996;

Sablowski et al., 1994; Sainz et al., 1997; Solano et al.,

1995a; Solano et al., 1997; Stober-Grasser et al., 1992; Urao

et al., 1993; Watson et al., 1993; Yang and Klessig, 1996;

Figure 4; see Introduction). In fact, the nucleotides (or their

counterparts in the complementary strand) present in the

non-randomized positions (–2, 11 and 13) are contacted

by residues fully conserved in all plant R2R3-MYB proteins

(Leu71, Lys121 and Asn122, respectively, in PhMYB3; the

G in the complementary strand of position –2 in type I

targets is contacted by another fully conserved residue,

Lys67 (Solano et al., 1997).

AtMYB15 and AtMYB84 bound the partially randomized

oligonucleotide mixture OII and, to a lesser extent, the OI

oligonucleotide mixture, and the reciprocal was true with

a carboxy-terminal deletion derivative of AtMYB77

(AtMYB77∆C1) which bound better to OI (data not shown).

AtMYB77∆C1 was used because the full size protein had

lower binding affinity, as is the case with other R2R3-MYB

proteins (PhMYB3 and MmMYB) (Ramsay et al., 1992;

Solano et al., 1995a). In contrast, neither AtMYBGl1 nor its

carboxy-terminal deletion derivatives showed detectable

binding to either of these oligonucleotide mixtures (not

shown). A similar result was obtained with an increased

amount of probe and/or a decreased amount of non-

specific competitor DNA, independently of the type of

probe used, the partially randomized oligonucleotide

Page 4: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

276 I. Romero et al.

mixtures OI and OII, or a fully randomized mixture (O, data

not shown). Protein phosphatase treatments, which have

been shown to increase binding affinity of one R2R3-MYB

protein (Moyano et al., 1996), were also ineffective (not

shown). Collectively, these data suggest limited in vitro

DNA-binding affinity for this protein. It is possible that low

DNA-binding affinity is an intrinsic property of AtMYBGl1

and that it might be increased in vivo after interaction(s)

with other protein(s). For example, there is evidence that

maize C1 protein (ZmMYBC1), which also shows low bind-

ing affinity in vitro (Sainz et al., 1997), requires an inter-

action with a second protein (the MYC protein R, Goff

et al., 1992) to activate flavonoid biosynthetic genes. A

similar interaction is possibly necessary for the activity of

AtMYBGl1 in vivo (Lloyd et al., 1992).

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

After four cycles of enrichment, oligonucleotides selected

by the R2R3-MYB proteins were cloned and sequenced. In

all instances, despite using two target oligonucleotide

mixtures, only one type of sequence was recovered for

each protein, indicating strong preference for one of the

types of sequences (Figure 4). For instance, in the case of

protein AtMYB77∆C1, which preferred type I sequences,

the sequences selected from oligonucleotide OII were also

of type I (generated in variable positions of OII, not shown)

and the reciprocal was true for proteins AtMYB15 and

AtMYB84 (not shown). These results argue against a bias

in the binding site selection experiments due to the use of

partially degenerated oligonucleotide mixtures, although

this possibility cannot be fully excluded.

Next, we used oligonucleotides representing the defined

Page 5: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

The R2R3-MYB gene family in Arabidopsis 277

optimal target sites and mutants of these sites in binding

experiments with each of the above Arabidopsis proteins

and with carboxy-terminal deletion derivatives of PhMYB3

(PhMYB3∆C1), AmMYB305 (AmMYB305∆C1) and MmMYB

(MmMYB∆C2R1; Solano et al., 1997) as controls (Figure 5a).

The results of these experiments agreed with those from

site selection experiments, but revealed that AtMYB77∆C1

also recognised certain type II sequences, although with

reduced affinity compared to that for type I sequences. In

addition, they also showed specific DNA binding affinity

for AtMYBGl1, as it could weakly bind to oligonucleotide

II-1. In an apparent discrepancy with binding site selection

experiments, protein AtMYB77∆C1 bound better to the

oligonucleotide containing one of the optimal binding sites

of PhMYB3 (MBSI, oligonucleotide I-1; Solano et al., 1995a)

than to that containing its deduced optimal binding

sequence (oligonucleotide I-2). Discrepancies between a

binding site selection derived sequence with the optimal

binding site have also been reported for MADS box proteins

(Riechmann et al., 1996). A difference between the two

oligonucleotides (I-1 and I-2) is that I-1 is flanked by three

extra As, which would increase its ability to bend, a

property known to greatly influence binding by DNA-

distorting/bending proteins, such as R2R3-MYB proteins

and MADS proteins (Parvin et al., 1995; Riechmann et al.,

1996; Solano et al., 1995b; Thanos and Maniatis, 1992). To

test whether this difference could be the cause of the

preference of AtMYB77∆C1 for oligonucleotide I-1 versus

I-2, DNA binding experiments were conducted with new

oligonucleotides in which the three extra As of oligonucleo-

tide I-1 had been removed. The binding by AtMYB77∆C1

to this deletion version of I-1 (I-1∆) was similar to that

obtained for the oligonucleotide derived from binding site

Figure 2. Deduced amino acid sequences of Arabidopsis R2R3-MYB proteins.

For comparison, the sequences of R2R3-MYB proteins from other plant species and from representative organisms of other phyla are also given. The

region shown is that flanked by the sequences used to derive the oligonucleotide mixtures shown in Figure 1. The clones corresponding to AtMYB41

and to AtMYB45 did not encode the carboxy-terminal part of their sequence due to mispriming events. For protein (and gene) names, a standardized

nomenclature has been used (Martin and Paz-Ares, 1997) whereby the name of each protein includes a two-letter prefix as species identifier, the term

MYB, and then a term describing the particular family member. The codes for the species identifier are: Am, Antirrhinum majus; At, Arabidopsis

thaliana; Cp, Craterostigma plantagineum; Dd, Dictyostelium discoideum; Dm, Drosophila melanogaster; Gh, Gossypium hirsutum; Hv, Hordeum vulgare;

Le, Lycopersicon esculentum; Mm, Mus musculus; Nt, Nicotiana tabacum; Os, Oryza sativa; Ph, Petunia hybrida; Pm, Picea mariana; Pp, Physcomitrella

patens; Ps, Pisum sativum; Xl, Xenopus laevis; Zm, Zea mays. As family member identifier we have always used a number except where the previously

given name was based on functional information, such as the phenotype of mutants (e.g. the Gl1 (Glabrous1) protein from Arabidopsis is named

AtMYBGl1). Thus, all the genes identified in this study have been given a standardized number independent of whether a different non-standardized

name has been given by other authors. This has occurred in the following cases: AtMYB13, also named AtMYBlfgn (accession number Z50869);

AtMYB15, also named Y19 (X90384); AtMYB16, also named AtMIXTA (X99809); AtMYB23, also named AtMYBrtf (Z68158); AtMYB31, also named Y13

(X90387); AtMYB44, also named AtMYBR1 (Z54136); AtMYB77, also named AtMYBR2 (Z54137). In addition, the following R2R3-MYB genes, which were

not identified in this study, were renamed (with the agreement of the authors who first described them): AtMYB101 (M1); AtMYB102 (M4). AtMYB90

is described in the EMBL databank as an anonymous EST (H76020). The column on the right of the amino acid sequence gives the accession number

from which the sequences were derived. The accession numbers of the cDNAs encoding the full-size proteins AtMYB15, AtMYB77 and AtMYB84 are

Y14207, Y14208 and Y14209, respectively. In case of PhMYBAn2, the sequence was copied directly from Quattrochio (1994). The second column shows

the position of the intron interrupting that part of coding sequence represented in the figure: –, unknown; 0, no intron; the localization of introns 1, 2

and 3 is shown relative to the consensus sequence. The third column shows the type of clone isolated in this study: a, cDNA clone; b, genomic clone.

Other letters in this column indicate that the sequence shown in the figure was previously described in databanks or published (c) or that only part

of the sequence shown was previously described (d). Two additional sequences (accession numbers H36793 and T42245), each corresponding to a

novel Arabidopsis R2R3-MYB gene, were found in the EST databank, but are not represented in the figure because they were incomplete. These

sequences were, however, used for the estimation of the size of the R2R3-MYB gene family. Asterisks indicate proteins for which the sequence of the

whole R2R3-MYB domain is known. Symbols in the consensus sequence are as in Figure 1.

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

selection experiments (Figure 5b). This result underscored

the importance of DNA conformational properties in bind-

ing by transcriptional factors.

Discussion

Genes of the R2R3-MYB family are quite widespread in

eukaryotes, with the exception of yeast, and in plants the

number of these genes is especially high. Whereas no

more than three R2R3-MYB genes have been described in

any organisms from other eukaryotic phyla, here we isol-

ated partial cDNA and/or genomic clones corresponding

to 78 different R2R3-MYB genes from Arabidopsis and

estimated that there are probably more than 100 R2R3-

MYB genes in this species. The different size of regulatory

gene families in different groups of eukaryotes, a situation

which is not exclusive for R2R3-MYB genes (for instance,

see the case of MADS box proteins; Theissen et al., 1996),

might reflect major differences in developmental and meta-

bolic programs generated during evolution of these groups,

which largely involved a different use of pre-existing regu-

latory systems rather than the generation of new systems

(Martin and Paz-Ares, 1997).

According to recent estimates on the number of genes

in Arabidopsis (16 000–43 000; Gibson and Sommerville,

1993), members of the R2R3-MYB family would

represent at least 0.2–0.6% of the total Arabidopsis genes,

the largest proportion of genes thus far assigned to a

single regulatory gene family (and even to a gene family

encoding any type of protein) in plants. In other types of

eukaryotes there are families of equal, or even larger, size;

for instance, it is estimated that genes encoding zinc-finger

proteins represent about 1% of the human genes (Hoovers

Page 6: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

278 I. Romero et al.

et al., 1992) and, in Caenorhabditis elegans, about 0.4% of

its genes contain homeoboxes (Burglin, 1995). However,

in these families overall sequence conservation is very low

and variability in DNA-binding specificity is high (Klug

and Schwabe, 1995; Treisman et al., 1992). In contrast,

members of the plant R2R3-MYB family share higher amino

acid sequence similarity, particularly in their recogni-

tion helices (Figure 1) and display considerable DNA-

recognition similarities (Figures 3 and 5).

These similarities in recognition specificity are par-

ticularly noticeable between members of the same

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

phylogenetic group, although in some cases overlaps in

binding specificity between members belonging to differ-

ent groups have been observed (Figures 3 and 5). Thus,

in the cases studied here or elsewhere (Biedenkapp et al.,

1988; Grotewold et al., 1994; Gubler et al., 1995; Howe

and Watson, 1991; Li and Parish, 1995; Moyano et al.,

1996; Sablowski et al., 1994; Sainz et al., 1997; Solano

et al., 1995a, 1997; Stober-Grasser et al., 1992; Urao et al.,

1993; Watson et al., 1993; Yang and Klessig, 1996)

members from group A (including both those from plants

and from organisms from other phyla) prefer (or bind

Page 7: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

The R2R3-MYB gene family in Arabidopsis 279

to) a type I sequence, members of group B bind equally

well to both type I and type II, and most members of

group C prefer (or bind to) a type IIG. Possible exceptions

are the proteins from group C AtMYB2, reported to bind

type I sequences (Urao et al., 1993), and GLABROUS1

(AtMYBGl1), which only bound to a type II sequence

(Figure 5) although, in the first case, binding to IIG

sequences was not studied and, in the second case,

binding site selection experiments failed to provide

information on its optimal binding site (see Results).

However, it is striking that the only sequence bound by

AtMYBGl1 (AAAGTTAGTTA) perfectly conforms to the

sequence of gibberellic acid responsive elements, and

gibberellic acid is known to affect the AtMYBGl1-

controlled trait trichome formation (Oppenheimer et al.,

1991; Telfer et al., 1997).

In line with these similarities in binding specificity, and

despite the fact that target selectivity is usually also

influenced by interactions with other factors, most of the

R2R3-MYB proteins studied so far, which are scattered

throughout groups B and C, have been implicated in the

control of phenylpropanoid biosynthetic genes (Cone

et al., 1993; Franken et al., 1994; Grotewold et al., 1994;

Moyano et al., 1996; Paz-Ares et al., 1987; Quattrocchio

et al., 1993, 1994; Sablowski et al., 1994; Solano et al.,

1995a; Figure 3). Nevertheless, there are some R2R3-MYB

proteins that have been implicated in other functions,

including the control of cell differentiation and the

mediation of plant responses to several signal molecules

(Gubler et al., 1995; Noda et al., 1994; Oppenheimer et al.,

1991; Urao et al., 1993; Yang and Klessig, 1996). Target

Figure 3. Phylogenetic tree of the R2R3-MYB family using the neighbor-joining method (Saitou and Nei, 1987).

The phylogram shown was constructed with the sequences given in Figure 2, except HvMYB33, LeMYB1, AtMYB67, AtMYB41 and AtMYB45. The first

two were excluded because they were the only ones out of the 57 known complete-MYB-domain sequences which grouped differently (with bootstrap

support . 50%) depending on whether the complete MYB domains or the portion characterized in this study was used in the calculations. Protein

AtMYB67 was the only one which was not grouped with the other proteins encoded by genes with the same intron/exon structure. Proteins AtMYB41

and AtMYB45 were not used because only partial sequence data were available, although their probable position in the phylogram, inferred from a

tree constructed also using their incomplete sequences (not shown), is indicated in the tree with dashed lines. Exclusion of these five proteins increased

the bootstrap support of the major nodes (not shown). Names of R2R3-MYB proteins from non-plant species are shown in red. The three major nodes,

A, B and C, are denoted. Numbers (0, 1, 2 or 3) in some branches indicate the type of intron in the cloned portion of the genes encoding proteins

originating from the respective branch, as far as the genes for which this information is available are concerned (Figure 2). Nodes with high bootstrap

support are indicated (empty symbols, bootstraps . 50%; filled symbols, bootstraps . 75%). Circles refer to bootstraps data corresponding to the

represented tree. Squares refer to bootstraps data corresponding to the tree constructed with the sequence of the whole MYB domain of the proteins

for which this information was available (Figure 2). The known functions associated with some plant R2R3-MYB proteins are indicated: Ph, regulation

of phenylpropanoid biosynthetic genes (proteins ZmMYBC1, ZmMYBPl, ZmMYBP, ZmMYB38, ZmMYB1, AmMYB305, AmMYB340, PhMYBAn2; PhMYB3,

Cone et al., 1993; Franken et al., 1994; Grotewold et al., 1994; Moyano et al., 1996; Paz-Ares et al., 1987; Quattrocchio et al., 1993; Quattrocchio, 1994;

Sablowski et al., 1994; Solano et al., 1995a); CD, control of cell differentiation (proteins AtMYBGl1 and AmMYBMx, Noda et al., 1994; Oppenheimer

et al., 1991); SA, GA and ABA, involved in signal transduction pathway, respectively, salicylic acid (gene NtMYB1; Yang and Klessig, 1996), gibberellic

acid (proteins HvMYBGa, Gubler et al., 1995) and abscisic acid (proteins AtMYB2 and ZmMYBC1; Hattori et al., 1992; Urao et al., 1993). Capital letters

are used when the functions associated are based on genetic evidence (i.e. analysis of mutants). Also indicated is the available information on DNA-

binding specificity of some of the R2R3-MYB proteins, (arrowheads indicate the proteins examined in this study): I, CNGTTR (proteins MmMYB,

MmMYBA, MmMYBB, DdMYB, AtMYB1, AtMYB2, AtMYB77, PhMYB3, HvMYBGa, NtMYB1; Biedenkapp et al., 1988; Howe and Watson, 1991; Solano

et al., 1995a; Stober-Grasser et al., 1992; Urao et al., 1993; Watson et al., 1993); II, GTTWGTTR (proteins PhMYB3, HvMYBGa, AmMYB305, ZmMYBC1,

AtMYBGl1; Gubler et al., 1995; Sainz et al., 1997; Solano et al., 1995a; Solano et al., 1997); IIG, GKTWGGTR (proteins AmMYB305, AmMYB340, ZmMYBP,

ZmMYBC1, AtMYB6, AtMYB7, AtMYB15, AtMYB84, NtMYB1; Grotewold et al., 1994; Li and Parish, 1995; Moyano et al., 1996; Sablowski et al., 1994;

Sainz et al., 1997; Solano et al., 1995a; Yang and Klessig, 1996) (where N indicates A or G or C or T; K, G or T; R, A or G; W, A or T). Capital letters

are used in those cases in which the sequences are known to be the optimal binding site. When a given protein is able to bind to more than one

type of site, the size of the letter reflects the relative binding affinity for these sites.

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

genes of these latter R2R3-MYB genes are mostly

unknown, thus precluding definite conclusions about

whether they are functionally related between themselves

or indeed with the R2R3-MYB genes regulating phenyl-

propanoid biosynthetic genes. However, the signal

molecules salicylic acid, ABA and GA influence, among

others, the expression of phenylpropanoid biosynthetic

genes, in several instances through cis-acting elements

resembling R2R3-MYB binding sites (Dixon and Paiva,

1995; Hahlbrock and Scheel, 1989; Hattori et al., 1992;

Sablowski et al., 1994; Shirasu et al., 1997; Weiss et al.,

1990, 1992). In addition, GA also affects trichome forma-

tion, another trait under the control of an R2R3-MYB

gene, AtMYBGl1 (Telfer et al., 1997). Moreover, the MIXTA

gene (AmMYBMx) controls the specialized shape of inner

epidermal petal cells of Antirrhinum flowers, and these

changes in cell shape correlate with changes in the cell

wall, a structure containing phenylpropanoid derivatives

(Noda et al., 1994).

The number of R2R3-MYB genes with distinct but

related functions might therefore be extraordinarily high,

particularly with regard to the regulation of different

phenylpropanoid biosynthetic genes, although some of

these genes could also (or alternatively) act on other

types of targets (e.g. the barley gibberellic acid induced

α-amy gene is a likely target of HvMYBGa, Gubler et al.,

1995). In any case, the broad (phylogenetic) distribution

of the R2R3-MYB genes for which there is evidence of

their involvement in the regulation of phenylpropanoid

metabolism, suggests that a very early plant-specific

R2R3-MYB ancestor already had this function, and that

Page 8: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

280 I. Romero et al.

Figure 4. DNA-binding specificity of the Arabidopsis proteins AtMYB77,

AtMYB15 and AtMYB84, obtained using binding site selection

experiments.

(a) Sequence of the partially random core of the oligonucleotide mixtures

(O-I and O-II) used in the binding site selection experiments.

(b) Summary of the nucleotide sequences of the oligonucleotides selected

by the different R2R3-MYB proteins. The base constitution around the

consensus is indicated in percentage. Asterisks indicate the positions at

which the nucleotide sequence was fixed in the original oligonucleotide

mixture. The type of binding site (I, II or IIG; see Figure 3) of each

protein is indicated. The part of the sequence determining the type of

binding site is underlined.

this was probably the ancestor of at least the genes

belonging to groups B and C (information on the function

of genes belonging to group A is currently lacking).

The existence of functional relationships between

members of the same family of transcription factors

have been documented for virtually any of these families.

For instance, HOX proteins control related developmental

pathways and in some cases they share some target

genes (Botas, 1993). However, the complete extent of

the relationship is difficult to be defined in most cases,

due to the limited information on the genes involved in

the pathways they regulate. In contrast, the R2R3-

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

Figure 5. DNA-binding specificity of different R2R3-MYB proteins as

shown by EMSA.

The proteins used in the experiment were the full-size AtMYB84, AtMYB15

and AtMYBGl1, and the deletion derivatives PhMYB3∆C1 (amino acid

residues 1–180 of PhMYB3), MmMYB∆C2R1 (aminoacid residues 89–236

of MmMYB), AmMYB305∆C1 (amino acid residues 1–159 of AmMYB305)

and AtMYB77∆C1 (amino acid residues 1–200 of AtMYB77, see Methods).

The core sequence of the oligonucleotides used in the assay (a) are

shown on top of each lane. New sequences used in (b), corresponding

to the deletion derivatives of I-1 and II-1 lacking three As, are I-1∆(AAACGGTTA) and II-1∆ (AGTTAGTTA). All reactions contained an

equimolar amount of protein as well as DNA. The autoradiograph

corresponding to the protein AtMYBGl1 was threefold over-exposed.

MYB gene family regulated phenylpropanoid biosynthetic

pathway is biochemically well characterized and thus can

be used as a reporter to evaluate functional relationships

(as well as functional diversity) between several, and

potentially many, members of this gene family (because

the synthesis of each phenylpropanoid involves common

as well as specific enzymatic steps). The clones isolated

in this work should allow the use of reverse genetic

approaches to carry out such studies.

Plants, as sessile organisms, have evolved a great

plasticity in their developmental and metabolic programs

to cope with changing environmental conditions (Steeves

and Sussex, 1990). This requires very flexible regulatory

mechanisms whereby patterns of gene expression can be

Page 9: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

The R2R3-MYB gene family in Arabidopsis 281

continuously adjusted in response to any environmental

change. The presence of large-sized regulatory gene

families such as the R2R3-MYB family, whose members

often share some target genes on which each regulatory

gene may exert a different effect, could have contributed

to these flexible control mechanisms.

Experimental procedures

Plant material

Arabidopsis thaliana, Landsberg ecotype, was used in this study.

Standard molecular procedures

All methods, including screening of cDNA libraries, RNA and

genomic DNA isolation, labelling of DNA and oligonucleo-

tides, etc., were performed as described previously (Avila et al.,

1993; Sambrook et al., 1989), except where indicated. The

vectors for cloning were pUC19 (Yanisch-Perron et al., 1985)

and pBluescriptII (Alting-Mees and Short, 1989).

The search for R2R3-MYB genes was performed in two

sequential stages. In the first stage, we prepared a cDNA library

enriched in R2R3-MYB genes using PCR with the R2R3-MYB-

specific oligonucleotides described in Figure 1, in all possible

pairwise combinations that included one oligonucleotide mixture

corresponding to the R2 repeat and one corresponding to the

R3 repeat. Since the size of the amplified R2R3-MYB cDNA

fragments was predictable (about 180 bp), the PCR-amplified

cDNA was size selected prior to cloning. To discard genes

already sequenced, an iterative procedure was used consisting

of hybridization to the library at high stringency using as a

probe the inserts of 20 previously sequenced clones. In a

second stage, the same PCR procedure was applied to genomic

DNA to reduce biases due to differential expression of different

R2R3-MYB genes. The amplified genomic DNA was cloned

directly, since the presence of introns in the amplified region

precluded size selection-based enrichment. Alternatively, enrich-

ment in R2R3-MYB clones was carried out by low stringency

hybridization using a mixture of the previously isolated R2R3-

MYB cDNAs as a probe. The MYB-enriched genomic DNA

library was screened following the same iterative procedure

adopted for the cDNA library. The cDNA used in the PCR

reactions was derived from poly(A)1 RNA prepared from a

mixture of plants grown in soil or in MS (0.53) medium

(Murashige and Skoog, 1962), collected at different develop-

mental stages (from seedling to flowering stages), and also

included plants treated with ABA and GA. The hormonal

treatments were performed on plants germinated and grown

without hormone in liquid MS (0.53) medium for 7 days,

after which the corresponding hormone was added (final

concentrations: ABA, 100 mM; GA, 100 mM) and kept for 8 h

before the plants were collected. The poly(A)1 RNA (20 ng ml21)

was reverse transcribed with AMV reverse transcriptase

(0.7 U ml–1) in the presence of the ribonuclease inhibitor RNasin

(0.5 U ml–1), using oligo (dT)15 (25 ng ml–1) as a primer; the

reaction mixture was incubated at 42°C for 1.5 h. PCR amplifica-

tion of R2R3-MYB genes was performed as follows: the DNA

(cDNA or genomic DNA, 20–200 pg ml–1) was amplified for 30

cycles using polymerase (0.025 U ml–1). Each cycle of amplifica-

tion consisted of: 1 min at 94°C, 90 sec at 55°C and 2 min at

72°C, except the first two cycles in which the annealing

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

temperature was 40–42°C instead of 55°C. cDNA clones encoding

the full-size proteins AtMYB15, AtMYB77 and AtMYB84 were

isolated by screening a whole-plant Arabidopsis cDNA library in

vector λNM1149 (105 Pfu, M. Sanchez, unpublished observations),

using a mixture of the available R2R3-MYB cDNA fragments as

a probe. That of AtMYBGl1 was obtained after reverse transcrip-

tion and amplification with PCR with oligonucleotides Ngl1

(GAATGAGAATAAGGAGAAGAG) and Cgl1 (CTAAAGGCAGTACT-

CAATATC) designed on the basis of the previously reported

sequence of this gene (Oppenheimer et al., 1991). The conditions

of PCR were the same as above, except that the annealing

temperature was always 55°C.

Plasmid constructs and in vitro synthesis of proteins

Constructs coding for deletion derivatives of PhMYB3

(PhMYB3∆C1), MmMYB (MmMYB∆C2R1) and AmMYB305

(AmMYB305∆C1) were previously reported (Solano et al.,

1997). Constructs coding for AtMYB15, AtMYB77, AtMYB84 and

AtMYBGl1 were prepared by cloning the cDNA corresponding

to these proteins into vector pBluescriptII and transcription with

the T3 or T7 polymerase. Transcripts coding for deletion

derivatives of these proteins were obtained by digestion with

restriction enzymes within the coding region of the proteins

before in vitro transcription (e.g. AtMYB77∆C1 which contains

amino acid residues 1–200 of the wild-type protein was

obtained by predigestion with BamHI). In vitro translation and

standardization of protein amount was as described previously

(Solano et al., 1997).

DNA-binding assays

Binding site selection experiments were performed as

described previously (Solano et al., 1995a), except that rabbit

reticulocyte extract (2 ml) containing the in vitro synthesized

protein was substituted for the bacterial extracts. In addition,

two oligonucleotide mixtures with a partially degenerated core

were also used (OI: 59-ACCGCTCGAGTCGACN6CNGNTN2CGGA-

TCCTGCAGAATTCGCG-39; O2: 59-ACCGCTCGAGTCGACN6TNG-

NTN2CGGATCCTGCAGAATTCGCG-39; Figure 4). DNA binding

assays with selected oligonucleotides were carried out as in

Solano et al. (1997). Oligonucleotides I-1 and II-1 (MBSI and

MBSII in Solano et al., 1997) represent the optimal binding sites

defined for PhMYB3. Oligonucleotides I-2 (59-CGCGAATT-

CTGCAGGATCCGTGACAGTTACGTCGACTCGAGCGGT-39) and II-

2 (59-CGCGAATTCTGCAGGATCCGCGGTAGGTGGGTCGACTCG-

AGCGGT-39) represent the optimal binding sites of AtMYB77,

and of AtMYB15 and AtMYB84, respectively. The core sequence

of other oligonucleotides representing variants of I-2 and of II-2

are shown in Figure 5.

Estimation of the size of the R2R3-MYB family in

Arabidopsis

To estimate the total number of R2R3-MYB genes in Arabidopsis,

we reasoned as follows. If two samples of n1 and n2 individuals

are extracted randomly from a population of N individuals, the

probability that a particular individual will be present in both

samples is P 5 n1 3 n2/N2. On the other hand, if n3 is the

number of individuals in such a class, the above probability

can also be estimated as P 5 n3/N. From these two equations,

it follows that N 5 n13n2/n3. Should there be any common bias

Page 10: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

282 I. Romero et al.

in the two samples, the calculated N would be an underestimate

(as n3 would be higher than that expected for random samples).

The two samples of R2R3-MYB genes used in this study are

likely to share some bias. The R2R3-MYB sequences in databanks

are enriched in abundantly/moderately expressed genes, as

most correspond to EST/cDNA sequences. In the case of our

collection, although the bias towards the most highly expressed

genes has been alleviated by using an R2R3-MYB enriched

library from PCR-amplified genomic DNA, there is still some of

this type of bias since all genes identified had to cross-hybridize

to R2R3-MYB genes isolated from cDNA libraries.

Computer programs for protein and nucleic acid

analysis

Alignments, tree construction by the neighbour-joining method

and its bootstrapping (1000 samples) were performed with

CLUSTALW (Thompson et al., 1994). Using the matrices BOSUM

(Henikoff and Henikoff, 1992) or PAM 250 (Dayhoff et al., 1978)

did not make any difference to the results. In the case of the

parsimony method (Eck and Dayhoff, 1966), the PHYLIP package

(Felsenstein, 1989) was used. Multiple most parsimonious trees

were found and the consensus tree was built with the

CONSENSUS program of PHYLIP. Rates of synonymous and of

non-synonymous substitutions were calculated according to Nei

and Gojobori (1986) using the Ina program (Ina, 1995).

Acknowledgements

We are very grateful to the other members of the European

MYB function search consortium (the groups led by Michael

Bevan, Cathie Martin, Sjef Smeekens, Chiara Tonelli and Bernd

Weisshaar) for ongoing interest and stimulating discussions.

We thank Cathie Martin and Roger Watson for providing us

with the AmMYB305 and MmMYB progenitor constructs. We

also thank Francisco Garcıa Olmedo, Cathie Martin, Miguel

Angel Penalva, Santiago Rodrıguez de Cordoba, and Bernd

Weisshaar for critical reading of the manuscript. This work was

financed by grants from the EU (BIO2-CT93–0101; BIO4-CT95–

0129) and from the Spanish CICYT (BIO96–1115).

References

Alting-Mees, M.A. and Short, J.M. (1989) pBluescript II: gene

mapping vectors. Nucl. Acids Res. 17.

Avila, J., Nieto, C., Canas, L., Benito, M.J. and Paz-Ares, J.

(1993) Petunia hybrida genes related to the maize regulatory

C1 gene and to animal myb proto-oncogenes. Plant J. 3,

553–562.

Benfey, P.N. and Chua, N.H. (1989) Regulated genes in transgenic

plants. Science, 244, 174–181.

Biedenkapp, H., Borgmeyer, U., Sippel, A.E. and Klempnauer,

K.H. (1988) Viral myb oncogene encodes a sequence-specific

DNA-binding activity. Nature, 335, 835–837.

Bilaud, T., Koering, C.E., Binet-Brasselet, E., Ancelin, K., Pollice,

A., Gasser, S.M. and Gilson, E. (1996) The telobox, a Myb-

related telomeric DNA binding motif found in proteins from

yeast, plants and human. Nucl. Acids Res. 24, 1294–1303.

Blackwell, T.K. and Weintraub, H. (1990) Differences and

similarities in DNA-binding preferences of MyoD and E2A

protein complexes revealed by binding site selection. Science,

250, 1104–1110.

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

Botas, J. (1993) Control. of morphogenesis and differentiation

by HOM/Hox genes. Curr. Opin. Cell. Biol. 5, 1015–1022.

Burglin, T.R. (1995) The evolution of Homeobox genes. In

Biodiversity and Evolution (Arai, R., Kato, M. and Doi, Y.,

eds). Tokyo: The National Science Museum Foundation,

pp. 291–336.

Cone, K.C., Cocciolone, S.M., Burr, F.A. and Burr, B. (1993)

Maize anthocyanin regulatory gene pl is a duplicate of c1

that functions in the plant. Plant Cell, 5, 1795–1805.

Dayhoff, M.O., Schwartz, R.M. and Orcutt, B.C. (1978) A model

of evolutionary change in proteins. In Atlas protein sequence

structure. Volume 5 (Dayhoff, M.O., ed.). Silver Spring,

Maryland: National Biomedical Research Foundation, pp.

345–352.

Dixon, R.A. and Paiva, N.L. (1995) Stress-induced

phenylpropanoid metabolism. Plant Cell, 7, 1085–1097.

Eck, R.V. and Dayhoff, M.O. (1966) Atlas of Protein Sequences

and Structure. Siver Spring, Maryland: National Biomedical

Research Foundation.

Feldbrugge, M., Sprenger, M., Hahlbrock, K. and Weisshaar, B.

(1997) PcMYB1, a novel plant protein containing a DNA-

binding domain with one MYB repeat, interacts in vivo with

a light-regulatory promoter unit. Plant J. 11, 1079–1093.

Felsenstein, J. (1989) phylip Phylogeny inference package.

Cladistics, 5, 164–166.

Franken, P., Schrell, S., Peterson, P.A., Saedler, H. and Wienand,

U. (1994) Molecular analysis of protein domain function

encoded by the myb-homologous maize genes C1, Zm 1 and

Zm 38. Plant J. 6, 21–30.

Gibson, S. and Somerville, C. (1993) Isolating plant genes.

TIBTECH, 11, 306–313.

Goff, S.A., Cone, K.C. and Chandler, V.L. (1992) Functional

analysis of the transcriptional activator encoded by the maize

B gene: evidence for a direct functional interaction between

two classes of regulatory proteins. Genes Dev. 6, 864–875.

Graf, T. (1992) Myb: a transcriptional activator linking proliferation

and differentiation in hematopoietic cells. Curr. Opin. Genet.

Dev. 2, 249–255.

Grotewold, E., Drummond, B.J., Bowen, B. and Peterson, T.

(1994) The myb-homologous P gene controls phlobaphene

pigmentation in maize floral organs by directly activating a

flavonoid biosynthetic gene subset. Cell, 76, 543–553.

Gubler, F., Kalla, R., Roberts, J.K. and Jacobsen, J.V. (1995)

Gibberellin-regulated expression of a myb gene in barley

aleurone cells: Evidence for Myb transactivation of a high-pI

alpha-amylase gene promoter. Plant Cell, 7, 1879–1891.

Hahlbrock, K. and Scheel, D. (1989) Physiology and molecular

biology of phenylpropanoid metabolism. Annu. Rev. Plant.

Physiol. Plant Mol. Biol. 40, 347–336.

Hattori, T., Vasil, V., Rosenkrans, L., Hannah, L.C., McCarty, D.R.

and Vasil, I.K. (1992) The Viviparous-1 gene and Abscisic acid

activate the C1 regulatory gene for anthocyanin biosynthesis

during seed maturation in maize. Genes Dev. 6, 609–618.

Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution

matrices from protein blocks. Proc. Natl Acad. Sci. USA, 89,

10915–10919.

Hirayama, T. and Shinozaki, K. (1996) A cdc51 homolog of a

higher plant, Arabidopsis thaliana. Proc. Natl Acad. Sci. USA,

93, 13371–13376.

Hoovers, J.M., Mannens, M., John, R., Bliek, J., van Heyningen,

V., Porteous, D.J., Leschot, N.J., Westerveld, A. and Little,

P.F. (1992) High-resolution localization of 69 potential human

zinc finger protein genes: a number are clustered. Genomics,

12, 254–263.

Page 11: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

The R2R3-MYB gene family in Arabidopsis 283

Howe, K.M. and Watson, R.J. (1991) Nucleotide preferences in

sequence-specific recognition of DNA by c-myb protein. Nucl.

Acids Res. 19, 3913–3919.

Ina, Y. (1995) New methods for estimating the numbers of

synonymous and nonsynonymous substitutions. J. Mol. Evol.

40, 190–226.

Jackson, D., Culianez, M.F., Prescott, A.G., Roberts, K. and

Martin, C. (1991) Expression patterns of myb genes from

Antirrhinum flowers. Plant Cell, 3, 115–125.

Kirik, V. and Baumlein, H. (1996) A novel leaf-specific myb-

related protein with a single binding repeat. Gene, 183,

109–113.

Klug, A. and Schwabe, J.W. (1995) Protein motifs 5. Zinc fingers.

FASEB J. 9, 597–604.

Li, S.F. and Parish, R.W. (1995) Isolation of two novel myb-like

genes from Arabidopsis and studies on the DNA-binding

properties of their products. Plant J. 8, 963–972.

Lipsick, J.S. (1996) One billion years of Myb. Oncogene, 13,

223–235.

Lloyd, A.M., Walbot, V. and Davis, R.W. (1992) Arabidopsis

and Nicotiana anthocyanin production activated by maize

regulators R and C1. Science, 258, 1773–1775.

Luscher, B. and Eisenman, R.N. (1990) New light on Myc and

Myb. Part II. Myb. Genes Dev. 4, 2235–2241.

Marocco, A., Wissenbach, M., Becker, D., Paz-Ares, J., Saedler,

H., Salamini, F. and Rohde, W. (1989) Multiple genes are

transcribed in Hordeum vulgare and Zea mays that carry the

DNA-binding domain of the MYB oncoproteins. Mol. Gen.

Genet. 216, 183–187.

Martin, C. and PazAres, J. (1997) MYB transcription factors in

plants. Trends Genet. 13, 67–73.

Moyano, E., Martınez, G.J. and Martin, C. (1996) Apparent

redundancy in myb gene function provides gearing for the

control. of flavonoid biosynthesis in antirrhinum flowers. Plant

Cell, 8, 1519–1532.

Murashige, T. and Skoog, F. (1962) A revised medium for rapid

growth and bioassays with tobacco tissue cultures. Physiol.

Plant. 15, 473–497.

Nei, M. and Gojobori, T. (1986) Estimating synonymous and

nonsynonymous substitution rates. Mol. Biol. Evol. 3, 105–114.

Noda, K., Glover, B.J., Linstead, P. and Martin, C. (1994) Flower

colour intensity depends on specialized cell shape controlled

by a Myb-related transcription factor. Nature, 369, 661–664.

Ogata, K., Morikawa, S., Nakamura, H., Sekikawa, A., Inoue,

T., Kanai, H., Sarai, A., Ishii, S. and Nishimura, Y. (1994)

Solution structure of a specific DNA complex of the Mandb

DNA-binding domain with cooperative recognition helices.

Cell, 79, 639–648.

Oppenheimer, D.G., Herman, P.L., Sivakumaran, S., Esch, J. and

Marks, M.D. (1991) A myb gene required for leaf trichome

differentiation in Arabidopsis is expressed in stipules. Cell,

67, 483–493.

Parvin, J.D., McCormick, R.J., Sharp, P.A. and Fisher, D.F. (1995)

Prebending of a promoter sequence enhances affinity for the

TATA-binding factor. Nature, 373, 724–727.

Paz-Ares, J., Ghosal, D., Wienand, U., Peterson, P.A. and Saedler,

H. (1987) The regulatory c1 locus of Zea mays encodes a

protein with homology to myb proto-oncogene products and

with structural similarities to transcriptional activators. EMBO

J. 6, 3553–3558.

Quattrocchio, F. (1994) Regulatory genes controlling flower

pigmentation in Petunia hybrida. PhD thesis. Amsterdam:

Vrije Universiteit te Amsterdam.

Quattrocchio, F., Wing, J.F., Leppen, H.T.C., Mol, J.N.M. and

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

Koes, R.E. (1993) Regulatory genes controlling anthocyanin

pigmentation are functionally conserved among plant species

and have distinct sets of target genes. Plant Cell, 5, 1497–1512.

Ramsay, R.G., Ishii, S. and Gonda, T.J. (1992) Interaction of the

Myb protein with specific DNA binding sites. J. Biol. Chem.

267, 5656–5662.

Riechmann, J.L., Wang, M. and Meyerowitz, E.M. (1996) DNA-

binding properties of Arabidopsis MADS domain homeotic

proteins APETALA1, APETALA3, PISTILLATA and AGAMOUS.

Nucl. Acids Res. 24, 3134–3141.

Sablowski, R.W.M., Moyano, E., Culianez-Macia, F.A., Schuch,

W., Martin, C. and Bevan, M. (1994) A flower-specific Myb

protein activates transcription of phenylpropanoid biosynthetic

genes. EMBO J. 13, 128–137.

Sainz, M.B., Grotewold, E. and Chandler, V.L. (1997) Evidence

for direct activation of an anthocyanin promoter by the maize

C1 protein and comparison of DNA binding by related myb

domain proteins. Plant Cell, 9, 611–625.

Saitou, N. and Nei, M. (1987) The neighbor-joining method: a

new method for reconstructing phylogenetic trees. Mol. Biol.

Evol. 4, 406–425.

Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular

Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor,

NY: Cold Spring Harbor Laboratory Press.

Satta, Y. (1993) How the ratio of nonsynonymous to synonymous

pseudogene substitutions can be less than one.

Immunogenetics, 38, 450–454.

Shirasu, K., Nakajima, H., Rajasekhar, V.K., Dixon, R.A. and

Lamb, C. (1997)Salicylic acid potentiates an agonist-dependent

gain control. that amplifies pathogen signals in the activation

of defense mechanisms. Plant Cell, 9, 261–270.

Solano, R., Fuertes, A., Sanchez, L., Valencia, A. and Paz-Ares,

J. (1997) A single residue substitution causes a switch from

the dual DNA binding specificity of plant transcription factor

MYB.Ph3 to the animal c-MYB specificity. J. Biol. Chem. 272,

2889–2895.

Solano, R., Nieto, C., Avila, J., Canas, L., Dıaz, I. and Paz-Ares,

J. (1995a) Dual DNA-binding specificity of petal epidermis

specific MYB transcription factor (MYB.Ph3) from Petunia

hybrida. EMBO J. 14, 1773–1784.

Solano, R., Nieto, C. and Paz-Ares, J. (1995b) MYB.Ph3

transcription factor from Petunia hybrida induces similar DNA-

bending/distortions on its two types of binding site. Plant J.

8, 673–682.

Steeves, T.A. and Sussex, I.M. (1990) Patterns in Plant

Development, 2nd edn. Cambridge: Cambridge University

Press.

Stober-Grasser, U., Brydolf, B., Bin, X., Grasser, F., Firtel, R.A.

and Lipsick, J.S. (1992) The Myb DNA-binding domain is

highly conserved in Dictyostelium discoideum. Oncogene, 7,

589–596.

Taylor, D., Badiani, P. and Weston, K.A. (1996) A dominant

interfering Myb mutant causes apoptosis in T cells. Genes

Dev. 10, 2732–2744.

Telfer, A., Bollman, K.M. and Poethig, R.S. (1997) Phase change

and the regulation of trichome distribution in Arabidopsis

thaliana. Development, 124, 645–654.

Thanos, D. and Maniatis, T. (1992) The high mobility group

protein HMG I (Y) is required for NF-kB-dependent virus

induction of the human IFN-b gene. Cell, 71, 777–789.

Theissen, G., Kim, J.T. and Saedler, H. (1996) Classification and

phylogeny of the MADS-box multigene family suggest defined

roles of MADS-box gene subfamilies in the morphological

evolution of eukaryotes. J. Mol. Evol. 43, 484–516.

Page 12: More than 80R2R3-MYB regulatory genes in the genome of Arabidopsis thaliana

284 I. Romero et al.

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL

W: improving the sensitivity of progressive multiple sequence

alignment throught sequence weighting, position specific gap

penalties and weight matrix choice. Nucl. Acids Res. 22,

4673–4680.

Thompson, M.A. and Ramsay, R.G. (1995) Myb: An old

oncoprotein with new roles. Bioessays, 17, 341–350.

Toscani, A., Mettus, R.V., Coupland, R., Simpkins, H., Litvin, J.,

Orth, J., Hatton, K.S. and Reddy, E.P. (1997) Arrest of

spermatogenesis and defective breast development in mice

lacking A-myb. Nature, 386, 713–717.

Treisman, J., Harris, E., Wilson, D. and Desplan, C. (1992) The

homeodomain: a new face for the helix-turn-helix? Bioessays,

14, 145–150.

Urao, T., Yamaguchi, S.K., Urao, S. and Shinozaki, K. (1993) An

Arabidopsis myb homolog is induced by dehydration stress

and its gene product binds to the conserved MYB recognition

sequence. Plant Cell, 5, 1529–1539.

© Blackwell Science Ltd, The Plant Journal, (1998), 14, 273–284

Watson, R.J., Robinson, C. and Lam, E.W. (1993) Transcription

regulation by murine B-myb is distinct from that by c-myb.

Nucl. Acids Res. 21, 267–272.

Weiss, D., van Blockland, R., Kooter, J.M., Mol, J.N.M. and

van Tunen, A.J. (1992) Gibberellic acid regulates chalcone

syntethase gene expression in the corolla of Petunia hybrida.

Plant Physiol. 98, 191–197.

Weiss, D., van Tunen, A.J., Halevy, A.H., Mol, J.N.M. and Gerats,

A.G.M. (1990) Stamens and gibberellic acid in the regulation

of flavonoid gene expression in the corolla of Petunia hybrida.

Plant Physiol. 94, 511–515.

Yang, Y. and Klessig, D.F. (1996) Isolation and characterization

of a tobacco mosaic virus-inducible myb oncogene homolog

from tobacco. Proc. Natl Acad. Sci. USA, 93, 14972–14977.

Yanisch-Perron, C., Vieira, J. and Messing, J. (1985) Improved

M13 phage cloning vectors and host strains: nucleotide

sequences of the M13mp18 and pUC19 vectors. Gene, 33,

103–119.