comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic...

11
ORIGINAL PAPER Stefano Donadio Margherita Sosio Evi Stegmann Tilmann Weber Wolfgang Wohlleben Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis Received: 26 January 2005 / Accepted: 22 April 2005 / Published online: 9 July 2005 Ó Springer-Verlag 2005 Abstract The bal, cep, dbv, sta and tcp gene clusters specify the biosynthesis of the glycopeptide antibiotics balhimycin, chloroeremomycin, A40926, A47934 and teicoplanin, respectively. These structurally related compounds share a similar mechanism of action in their inhibition of bacterial cell wall formation. Comparative sequence analysis was performed on the five gene clus- ters. Extensive conserved synteny was observed between the bal and cep clusters, which direct the synthesis of very similar compounds but originate from two different species of the genus Amycolatopsis. All other cluster pairs show a limited degree of conserved synteny, involving biosynthetically functional gene cassettes: these include those involved in the synthesis of the car- bon backbone of two non-proteinogenic amino acids; in the linkage of amino acids 1–3 and 4–7 in the hepta- peptide; and in the formation of the aromatic cross- links. Furthermore, these segments of conserved synteny are often preceded by conserved intergenic regions. Phylogenetic analysis of protein families shows several instances in which relatedness in the chemical structure of the glycopeptides is not reflected in the extent of the relationship of the corresponding polypeptides. Coher- ent branchings are observed for all polypeptides encoded by the syntenous gene cassettes. These results suggest that the acquisition of distinct, functional genetic ele- ments has played a significant role in the evolution of glycopeptide gene clusters, giving them a mosaic struc- ture. In addition, the synthesis of the structurally similar compounds A40926 and teicoplanin appears as the re- sult of convergent evolution. Keywords Actinomycete Evolution Gene cluster Glycopeptide antibiotics Streptomyces Introduction The last decade has witnessed the molecular character- ization of a large number of gene clusters devoted to the synthesis of antibiotics and other bioactive secondary metabolites. Particularly in the case of Streptomyces and related actinomycete genera—the most prolific produc- ers of secondary metabolites in the bacterial world—the number of known clusters probably now exceeds 100. These gene sequences have contributed substantially to the definition of the rules for the biosynthesis of microbial metabolites (Walsh 2003), so that biosynthetic pathways can often be predicted from the sequences of clusters that direct the synthesis of known compounds. In some cases, the structure of an unknown metabolite can be partially or completely inferred from the se- quence of the gene cluster responsible for its synthesis (see, for example, Challis and Ravel 2000; Challis and Hopwood 2003). Gene clusters that specify the synthesis of structurally related compounds can provide important information on common biosynthetic steps and on features unique to each pathway. For example, synthesis of the glycopep- tide antibiotics balhimycin, chloroeremomycin, A40926, A47934 and teicoplanin is governed by the bal (Reck- tenwald et al. 2002), cep (van Wageningen et al. 1998), dbv (Sosio et al. 2003), sta (Pootoolal et al. 2002) and tcp (Li et al. 2004; Sosio et al. 2004) clusters, respectively. All these antibiotics consist of a heptapeptide backbone with extensive aromatic cross-links (Fig. 1) and all in- hibit bacterial growth by binding to the D-Ala-D-Ala termini of the growing peptidoglycan. The anti-com- plement molecule complestatin resembles the antibacte- Communicated by W. Goebel S. Donadio (&) M. Sosio Vicuron Pharmaceuticals, 21040 Gerenzano, Italy E-mail: [email protected] E. Stegmann T. Weber W. Wohlleben Department of Microbiology/Biotechnology, Eberhard-Karls Universita¨t Tu¨bingen, 72076 Tu¨bingen, Germany Present address: S. Donadio Ktedogen, 21046 Malnate, Italy Mol Gen Genomics (2005) 274: 40–50 DOI 10.1007/s00438-005-1156-3

Upload: stefano-donadio

Post on 15-Jul-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

ORIGINAL PAPER

Stefano Donadio Æ Margherita Sosio Æ Evi Stegmann

Tilmann Weber Æ Wolfgang Wohlleben

Comparative analysis and insights into the evolution of gene clusters forglycopeptide antibiotic biosynthesis

Received: 26 January 2005 / Accepted: 22 April 2005 / Published online: 9 July 2005� Springer-Verlag 2005

Abstract The bal, cep, dbv, sta and tcp gene clustersspecify the biosynthesis of the glycopeptide antibioticsbalhimycin, chloroeremomycin, A40926, A47934 andteicoplanin, respectively. These structurally relatedcompounds share a similar mechanism of action in theirinhibition of bacterial cell wall formation. Comparativesequence analysis was performed on the five gene clus-ters. Extensive conserved synteny was observed betweenthe bal and cep clusters, which direct the synthesis ofvery similar compounds but originate from two differentspecies of the genus Amycolatopsis. All other clusterpairs show a limited degree of conserved synteny,involving biosynthetically functional gene cassettes:these include those involved in the synthesis of the car-bon backbone of two non-proteinogenic amino acids; inthe linkage of amino acids 1–3 and 4–7 in the hepta-peptide; and in the formation of the aromatic cross-links. Furthermore, these segments of conserved syntenyare often preceded by conserved intergenic regions.Phylogenetic analysis of protein families shows severalinstances in which relatedness in the chemical structureof the glycopeptides is not reflected in the extent of therelationship of the corresponding polypeptides. Coher-ent branchings are observed for all polypeptides encodedby the syntenous gene cassettes. These results suggestthat the acquisition of distinct, functional genetic ele-ments has played a significant role in the evolution ofglycopeptide gene clusters, giving them a mosaic struc-ture. In addition, the synthesis of the structurally similar

compounds A40926 and teicoplanin appears as the re-sult of convergent evolution.

Keywords Actinomycete Æ Evolution ÆGene cluster Æ Glycopeptide antibiotics Æ Streptomyces

Introduction

The last decade has witnessed the molecular character-ization of a large number of gene clusters devoted to thesynthesis of antibiotics and other bioactive secondarymetabolites. Particularly in the case of Streptomyces andrelated actinomycete genera—the most prolific produc-ers of secondary metabolites in the bacterial world—thenumber of known clusters probably now exceeds 100.These gene sequences have contributed substantially tothe definition of the rules for the biosynthesis ofmicrobial metabolites (Walsh 2003), so that biosyntheticpathways can often be predicted from the sequences ofclusters that direct the synthesis of known compounds.In some cases, the structure of an unknown metabolitecan be partially or completely inferred from the se-quence of the gene cluster responsible for its synthesis(see, for example, Challis and Ravel 2000; Challis andHopwood 2003).

Gene clusters that specify the synthesis of structurallyrelated compounds can provide important informationon common biosynthetic steps and on features unique toeach pathway. For example, synthesis of the glycopep-tide antibiotics balhimycin, chloroeremomycin, A40926,A47934 and teicoplanin is governed by the bal (Reck-tenwald et al. 2002), cep (van Wageningen et al. 1998),dbv (Sosio et al. 2003), sta (Pootoolal et al. 2002) and tcp(Li et al. 2004; Sosio et al. 2004) clusters, respectively.All these antibiotics consist of a heptapeptide backbonewith extensive aromatic cross-links (Fig. 1) and all in-hibit bacterial growth by binding to the D-Ala-D-Alatermini of the growing peptidoglycan. The anti-com-plement molecule complestatin resembles the antibacte-

Communicated by W. Goebel

S. Donadio (&) Æ M. SosioVicuron Pharmaceuticals, 21040 Gerenzano, ItalyE-mail: [email protected]

E. Stegmann Æ T. Weber Æ W. WohllebenDepartment of Microbiology/Biotechnology, Eberhard-KarlsUniversitat Tubingen, 72076 Tubingen, Germany

Present address: S. DonadioKtedogen, 21046 Malnate, Italy

Mol Gen Genomics (2005) 274: 40–50DOI 10.1007/s00438-005-1156-3

Page 2: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

rial glycopeptides in consisting of a linear heptapeptidewith two aromatic cross-links. The corresponding comgene cluster has also been characterized (Chiu et al.2001).

These clusters originate from distantly related generaof actinomycetes: bal and cep derive from two distinctspecies of the genus Amycolatopsis [A. balhimycina(Wink et al. 2003) and A. orientalis, respectively],belonging to the family Pseudonocardiaceae; dbv fromthe genus Nonomuraea (family Streptosporangiaceae);sta (and com) from the genus Streptomyces (familyStreptomycetaceae); and tcp from the genus Actinoplanes(family Micromonosporaceae).

Balhimycin and chloroeremomycin are extremelysimilar representatives of the vancomycin class of anti-biotics, characterized by a heptapeptide backbone withLeu and Asn residues at positions 1 and 3, respectively(Fig. 1). The A40926, A47936 and teicoplanin share thesame heptapeptide skeleton, which differs from thevancomycin-type in that p-hydroxyphenylglycine (Hpg)and dihydroxyphenylglycine (Dpg) replace Leu and Asnrespectively, and in the lack of a b-hydroxyl group onTyr-2. The aromatic moieties carried by the Hpg-1 andDpg-3 residues are joined via an additional cross-link inthese glycopeptides (Fig. 1). Both A40926 and teicopl-anin have an N-acyl-sugar residue on Hpg-4 and a

mannosyl residue on Dpg-7. The A47934 molecule incontrast, altogether lacks sugar moieties, but has a sul-fate group at Hpg-1 and an additional chlorine atom atHpg-5. The complestatin heptapeptide shares with theantibacterial glycopeptides the sequence Hpg-Hpg-Tyras residues 4–6 (not shown).

Here, we present an analysis of the bal, cep, dbv, staand tcp gene clusters, and make relevant comparisonswith the com cluster. This work highlights commonmotifs and unexpected divergences. These data allow usto present a hypothesis for the evolution of glycopeptidegene clusters.

Materials and methods

Sequence analyses

The DNA sequences of the bal (GenBank Accession No.Y16952), cep (AJ223998, AJ223999 and AL078635), com(AF386507), dbv (AJ561198), sta (U82965) and tcp(AJ605139 and AJ632270) clusters were used as anno-tated. Sequence analyses were performed with the pro-grams in the Wisconsin Package, Version 10 (Accelrys).For the non-coding regions, all intergenic sequences(IGSs) longer than 30 bp (including those flanking eachcluster’s distal most genes) were extracted from the bal,cep, dbv, sta and tcp clusters, resulting in 22, 19, 26, 18

Fig. 1 Structures of glycopeptides

41

Page 3: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

and 27 individual IGSs, respectively. Next, each IGSwas searched against all others using FASTA. The IGSsshowing significant matches were grouped and analyzedwith the program PILEUP,followed by manual adjust-ment if necessary. Finally, conserved motifs identified inthe multiple alignment were searched against the entireclusters with the program FINDPATTERNS.

For the coding regions, protein sequences were alignedwith PILEUP,followed by manual adjustments, if neces-sary. Based on the final alignments, phylogenetic dis-tanceswere calculated using the programPAUP, selectingthe longest segment of significant similarity present in allaligned sequences (i.e. excluding N-terminal and C-ter-minal tails, when present), and using distance (minimumevolution) for tree optimality. Bootstrap replicates (100)were performed on heuristic tree searches. In each tree,additional sequence(s) were introduced as out group(s)and for further comparisons, if necessary.

Results and discussion

Gene composition and conserved synteny

The functions of many of the genes present in the gly-copeptide clusters, and their products, have been de-scribed in recent reviews (Hubbard and Walsh 2003;Sußmuth and Wohlleben 2003) and in subsequent pub-lications (Sosio et al. 2003, 2004; Li et al. 2004). How-ever, since different authors have used differentnomenclatures, we will adopt the names listed inTable 1. In the absence of complete functional studies,we assume that each cluster is represented by the longestDNA segment delimited by ORFs that find a homologin at least one of the other four clusters. Based on thiscriterion, the bal, cep, dbv, sta and tcp clusters include35, 34, 37, 34 and 43 presumably functional genes,respectively, and they are depicted schematically inFig. 2. Note that, since no additional sequence is avail-able to its left end, the bal cluster might contain morethan 35 genes. When the genes present in the five clusterswere grouped into homologs, 46 distinct families couldbe identified; 15 genes were unique (Table 1). Only 19homologs are present in all five clusters and these will bereferred to as the ‘‘core’’ glycopeptide genes. Fourteen ofthem are also found in the com cluster (Table 1).

As shown in Table 1, the bal and cep clusters share thelargest number of homologs (33 genes). All other clusterpairs share between 19 (cep and sta) and 29 (dbv and tcp)homologs. With respect to gene order and orientation,only the bal and cep clusters show conserved synteny overa considerable distance, about 63 kb (Fig. 3a). The extentof conserved synteny is limited in the other cases, and isabout equivalent for dbv and tcp (Fig. 3b), bal and dbv(Fig. 3c), sta and tcp (Fig. 3d), as well as for the bal-sta,bal-tcp and dbv-sta comparisons (not shown).

A closer inspection of Fig. 3 reveals that all clusterpairs showed the same syntenous segments, although therelative number and order of genes varies (see Fig. 2).

These segments include the following core glycopeptidegenes: the dpgABCD genes responsible for Dpg synthe-sis; the hmaS and hmo gene pair required for Hpg for-mation; the gene encoding the ABC transporter forexport of the final product, followed by the gene(s)encoding NRPS (non-ribosomal peptide synthetase)modules 1–3; the three genes encoding NRPS modules4–7 and the MbtH-like polypeptide; and the oxy andhalogenase genes, which direct the cross-linking andchlorination steps, respectively. The last three syntenoussegments are also present in the com cluster. Theapparent exception represented by the tcp cluster, inwhich the gene encoding the ABC transporter is locateddownstream of the NRPS-encoding genes, is probablydue to the duplication event discussed below. Only fourof the core glycopeptide genes (those specifying theaminotransferase HpgT, the putative ion-antiporter,prephenate dehydrogenase and an StrR-like regulator)do not belong to segments of conserved synteny.

All clusters appear to be located in different geneticcontexts, as judged by the different genes lying outside thecluster borders. This also holds for the bal and cep clusters,which actually diverge at their very ends (Fig. 3a).

Phylogenetic analyses

The above data point to a mosaic structure of the clus-ters, which are made up of a small number of syntenousgene cassettes. Phylogenetic analysis of homolog familiesmay help to establish the origin of individual genes and,possibly, of gene cassettes. In particular, we were inter-ested in understanding whether similar genes were in-volved in the synthesis of structurally similarcompounds. In some cases, the corresponding polypep-tides are expected to play the same role in the five gly-copeptide pathways (e.g. the enzymes involved in Dpgand Hpg formation). In other cases, the different hep-tapeptide skeletons would be expected to imply theexistence of two sets of NRPS modules 1–3 (bal-cepversus dbv-sta-tcp) and perhaps of other enzymes (e.g.the Oxy proteins). In any case, the bal and cep sequenceswere expected always to constitute the most closely re-lated pair, based on both function (the most similarglycopeptide pair) and origin (high extent of conservedsynteny).

Binary similarity scores indicate that the most closelyrelated pair is indeed almost always represented by thebal-cep sequences. The DpgD, the prephenate dehy-drogenase, the MbtH-like polypeptide and the mem-brane ion antiporter represent the few exceptions to thisgeneralization (Table 1). In the case of the first twopolypeptides, the binary similarity scores for the cep-dbvpairs (91.0 and 83.1% for DpgD and prephenatedehydrogenase, respectively) are similar to those forthe bal-cep pairs (88.4 and 81.8%, respectively). For theMbtH-like polypeptide, the highest score is given by thetwo tcp copies, as described below. However, the balmembrane ion-antiporter is significantly more related to

42

Page 4: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

the dbv polypeptide (69.9%) than to the cep sequence(44.5%). In addition, phylogenetic analyses (Fig. 4a)indicate that the bal, dbv and tcp proteins lie on the samebranch, together with a homolog from the C1027 cluster(Liu et al. 2002). In contrast, the com, sta and especially

cep sequences appear relatively divergent. Thus, thelimited relatedness and the differences in relative loca-tion within the clusters (Fig. 2) suggest different originsfor the gene encoding the membrane ion antiporter inthe bal and cep clusters.

Table 1 Established or deduced functions of the polypeptides encoded by the glycopeptide clusters

Biosynthetic role Functiona Clusterb Similarity scores(%)c

bal cep dbv Sta tcp

Dpg formation DpgA 1 1 1 1 1 80 (dt) 96 (bc)DpgB 1 1 1 1 1 59 (st) 84 (bc)DpgC 1 1 1 1 1 71 (ct) 87 (bc)DpgD 1 1 1 1 1 78 (ct) 91 (cd)

Hpg formation HmaS 1 1 1 1 1 54 (dt) 84 (bc)Hmo 1 1 1 1 1 60 (bt) 88 (bc)

Aminotransferase HpgT 1 1 1 1 1 52 (ct) 93 (bc)Prephenate dehydrogenase 1 1 1 1 1 45 (ct) 83 (cd)

b-Hydroxy Tyr formation (bal and cep only) Hydrolase 1 1BpsD 1 1OxvyD 1 1A-Hydroxylase 1 1 2d 60 (dt2) 89 (dt1)Aldolase 1 1 e

Peptide synthesis NRPS modules 1–3 1 1 2 2 2f 65 (bs) 88 (bc)NRPS modules 4–6 1 1 1 1 1 78 (bs) 89 (bc)NRPS module 7 1 1 1 1 1 79 (bs) 92 (bc)Single-chain thioesterase 1 1MbtH-likeg 1 1 1 1 2d 81 (bt1) 93 (tt)

Cross-linking OxyA 1 1 1 1 1 77 (cs) 89 (bc)OxyB 1 1 1 1 1 79 (cs) 89 (bc)OxyC 1 1 1 1 1 76 (bs) 94 (bc)OxyEg 1 1 1

Halogenation Halogenase 1 1 1 2d 1 72 (bs2) 97 (bc)Sugar addition GtfA 1 1 1

GtfB 1 1 1 1 73 (bd) 87 (bc)GtfC 1 1Mannosyltransferase 1 1

Peptide methylation Methyltransferase 1 1 1Sugar formation EvaC 1 1

EvaA 1 1EvaE h 1EvaB 1 1EvaD 1 1

Sugar modification Acyltransferase 1 1Regulation StrR-like regulator 1 1 1 1 1 52 (bt) 88 (bc)

HygR-like regulator 1 1Sensor kinase 1i 1 1 1 37 (dt) 82 (st)Response regulator 1 1 1 1 59 (bd) 93 (st)

Resistance MurF 1 1VanH 1 1VanA 1 1VanX 1 1VanY 1 1

Transport ABC transporter 1 1 1 1 1j 77 (cs) 91 (bc)Membrane ion antiporter 1 1 1 1 1 39 (bs) 70 (bd)

Unknown bal Orf2k 1 1 1 1 67 (dt) 85 (bc)Unique 0 1 5 4 5

a Core glycopeptide genes are indicated in bold type, while thosepresent also in the com cluster are listed in italicsb The number of genes for a given function present in each cluster isgivenc The lowest and highest binary similarity scores are reported in theleft and right columns, respectively. The pair responsible for thescore is indicated in parentheses, with clusters abbreviated as: b, bal;c, cep; d, dbv; s, sta; and t, tcp. When two homologs are present inone cluster, these are indicated by subscripts 1 and 2d Genes which are assumed to have equivalent functions aregrouped under the same family

e A divergent aldolase is encoded by the tcp cluster, but is countedamong the unique functionsf Modules 1–3 are present on two separate polypeptidesg These abbreviations are arbitrary: MbtH-like, short polypeptideof unknown function; OxyE, presumably involved in cross-linkingof amino acids 1 and 3h Present as a pseudogene in bali Only the first 300 codons are represented in the sequenced DNAj The pseudogene referred to in the text is not countedk Indicated by the designation of the bal ORF (Recktenwald et al.2002)

43

Page 5: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

Phylogenetic analyses of other homologs providedadditional clues to gene origins. In particular, the pre-phenate dehydrogenases separate into two groups, onecomprising the bal, cep and dbv polypeptides, the otherthe com, sta and tcp enzymes; indeed the latter grouptogether with a polypeptide from Streptomyces coeli-color (Fig. 4b). The DpgA sequences fall into two sub-groups, one represented by bal, cep and dbv, and theother by sta and tcp (Fig. 4c). Similar results were ob-tained for DpgB, DpgC and DpgD (data not shown),indicating that the entire dbv dpgABCD cassette is moreclosely related to the bal or cep sequence than it is toeither its sta or tcp counterpart. Also for the HpgT andthe StrR-like polypeptides, the dbv sequences aregrouped together with bal and cep, while sta and tcp lieon separate branches (data not shown). In the case of thetwo-component signal transduction system, the responseregulators (Fig. 4d) and sensor kinases (Fig. 4e) fromthe sta and tcp clusters group together, while the dbvpolypeptides lie with the corresponding sequences fromthe com cluster (Chiu et al. 2001). For the ABC trans-porters, halogenases and StrR-like regulators, thebranching patterns indicated in each case that the compolypeptide is the most divergent sequence (not shown).

NRPS modules 1–3 are encoded by one and twogenes in the bal-cep and dbv-sta-tcp systems, respec-tively. Binary similarity scores and phylogenetic trees(not shown) are consistent with two distinct origins forthese two sets. In addition, the separation of the genesencoding modules 1–3 from those encoding modules 4–7in the dbv cluster suggests that synthesis of the hepta-peptide might have originated by combining genes formodules 1–3 (responsible for the variable part of theheptapeptide) with those for modules 4–7.

Cross-linking of aromatic residues, a characteristicfeature of glycopeptides, occurs in a defined sequence ofsteps, withOxyB andOxyA specifying the first and secondcross-linking events, respectively (Sußmuth and Wohlle-ben 2003). While it is always located downstream of thegene encoding NRPS module 7, the oxy region itself ischaracterized by variability in gene order. Only the posi-tion of oxyA is constant in the five clusters. Between oxyAand oxyB lies oxyE, which presumably serves to cross-linkamino acids 1 and 3 in the dbv, sta and tcp clusters, while ahal gene lies in between oxyB and oxyC in the sta and tcpclusters (Fig. 2). The Oxy polypeptides can be subdividedinto separate families according to their cross-linkingroles (Fig. 5). In each Oxy family, the sta and tcp se-

Fig. 2 Gene clusters. The genespresent in each cluster areindicated by arrows drawn toscale, except in the case of theNRPS genes (thick arrows),which are drawn on a reducedscale. The segments ofconserved synteny are identifiedby the color coding, and labeledwith capital letters designatinggene names: yellow, dpgABCD;light blue, oxy-hal region; pink,gtf region. Other colorsindicate: red, hmaS-hmo; darkgreen, genes encoding the ABCtransporter and NRPS modules1 and 2; bright green, genesencoding NRPS modules 4–7and the MbtH-like polypeptide;light brown, strR-like; darkbrown, hpgT and the prephenatedehydrogenase-encoding gene;dark blue, the membrane ionantiporter gene. The blackarrows denote genes shared byat least two clusters; while grayarrows refer to unique genes.Conserved IGSs (lower caseletters) are symbolized bycolored dots, as follows: yellow,dark green, light blue and orangeindicate the conserved IGSsshown in Fig. 6a–d,respectively. Note that theorientation of the sta cluster isreversed with respect to that ofGenBank Accession No.U82965

44

Page 6: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

quences are more closely related to each other than theyare to their dbv counterparts, and occur in the same order.While ComJ appears to be part of the OxyB family, and ispresumably involved in C–O–C coupling of amino acids 4and 6, the relative position of ComI in the tree remainsuncertain (Fig. 5).

Thus, a clear correlation between glycopeptidestructure and gene origin is found only when one com-pares the com ABC transporter, NRPS, Oxy and ha-logenase sequences to those in the other clusters. This isexpected, since the NRPS, Oxy and halogenase proteinsare involved in the synthesis and early modification ofthe heptapeptide. However, within the glycopeptidestructures shown Fig. 1, no clear correlation exists, andthere are several dbv genes that are more closely relatedto the bal or cep sequences than to their counterparts inthe tcp cluster. At the same time, some sta and tcp se-quences appear to share a common origin. This indicatesthat similarity in the chemical structures of the finalcompounds is not necessarily reflected in similarity atthe genetic level.

GC content

Glycopeptide antibiotics have been reported only fromactinomycetes whose genomic DNAs have a GC contentof 70–73%. Thus, no significant differences were

expected in the GC contents of the clusters studied here.Nonetheless, the overall GC content of the cep cluster(68%) is lower than that of bal (72%), and a 3–4%difference is observed in all syntenous segments. Thelack of additional genomic sequences from A. orientalisand A. balhimycina prevents us from establishing howthe GC contents of the bal and cep clusters relate tothose of their respective chromosomes. In any case, thebal and cep clusters have evolved independently for longenough to allow them to diverge markedly in GC con-tent.

Conserved intergenic regions

The syntenous segments of the bal and cep clusters thatare delimited by the genes encoding the putative pre-phenate dehydrogenase and aldolase share 82.1% simi-larity over a stretch of 62.6 kb. Non-coding sequencesare expected to diverge more rapidly than coding se-quences and, indeed, when coding sequences were ex-cluded from these segments, the similarity score wentdown to 47.1% in the remaining 2 kb of IGSs. Thus,conserved non-coding sequences were not expected inthe other clusters, given the limited conservation of genecomposition and order. Nonetheless, four conservedstretches of non-coding DNA were identified and,interestingly, three of them are associated with segments

Fig. 3 Conserved synteny. Theentire bal and sta clusters, and70, 74 and 82-kb segments fromthe cep, dbv and tcp clusters,respectively, were comparedusing the programsCOMPARE with a window of50 and the stringency set to 35.In the case of multiplehomologous genes (i.e. theNRPS-encoding and the Oxy-encoding genes), only the bestmatching segments arereported. The tick marks on theaxes indicate 10-kb sequenceintervals. The thick barsindicate the NRPS genes

45

Page 7: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

of conserved synteny. We did not detect any of theseconserved non-coding sequences in the com cluster.

A 115-bp segment immediately preceding the dpgAstart codon shows significant conservation in all fiveclusters (Fig. 6a). These conserved segments are desig-nated bal-a through tcp-a in Fig. 2. In addition, con-servation among bal-a, cep-a and dbv-a is higher(similarity values are 69–75%) than between sta-a andtcp-a (44–57%). This is consistent with the higher degreeof relatedness of the bal-cep-dbv coding sequences in thedpgABCD loci.

The five clusters also present a conserved 80-bp seg-ment immediately preceding the start codon of the geneencoding the ABC transporter (Fig. 6b). This conserved

segment is present in two distinct IGSs in the tcp cluster:one precedes the gene encoding the ABC transporter andis designated tcp-b2 (Fig. 2), and the other precedes the

Fig. 4 Phylogenetic analyses of protein families. Phylogenetic treeswere constructed as described under Materials and methods.Numbers at nodes are bootstrap values based on 100 resamplings(only values higher than 80 are shown), while the scale bar indicates10 inferred nucleotide substitutions per 100 residues. Othersequences used are named for their species of origin, as follows:Mechi, Micromonospora echinospora; Nfarc, Nocardia farcinica;Scoel, S. coelicolor; Tfusc, Thermobifida fusca. a Membrane ion-antiporter sequences, including polypeptides from the C-1027cluster (Accession No. AAL06655) and from M. echinospora(AAR98563), with the sequence NCZ (BAD38870) from theneocarzilin cluster (Otsuka et al. 2004) as outgroup. b Prephenatedehydrogenase sequences, including a S. coelicolor polypeptide(NP_733544), with a T. fusca polypeptide (ZP_00291689) asoutgroup. c DpgA sequences, with an N. farcinica polypeptide(YP_121145) as outgroup. d Sensor kinase sequences, with a S.coelicolor polypeptide (NP_627785) as outgroup. The incompletebal sequence has not been included. e Response regulatorsequences, including a S. coelicolor polypeptide (NP_733611), witha T. fusca polypeptide (ZP_00294117) as outgroup

Fig. 5 Phylogenetic analysis of Oxy proteins. Dendrogram basedon the Oxy sequences, including those from the com cluster, usingbal OxyD as outgroup. The bar represents indicates 10 inferrednucleotide substitutions per 100 residues. Branch points supportedby bootstrap resampling are denoted by filled (380) or open (<80)circles. The thick vertical lines delimit Oxy families

46

Page 8: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

gene encoding NRPS modules 1 and 2 and is designatedtcp-b1 (Fig. 2). The tcp-b1 ends at the putative startcodon for a 300-bp ORF, whose predicted productshows 40% identity to the N-terminal portion of gly-copeptide ABC transporters. Thus, only tcp-b2 precedesthe gene encoding the ABC transporter specified by allglycopeptide clusters, while tcp-b1 and the NRPS genesare separated by a short ORF, encoding a truncated,presumably non-functional, ABC transporter. Twocopies of this conserved segment are present in oppositeorientations in the same IGS of the dbv cluster, which isexpected to contain the divergent promoters (Fig. 2) forthe expression of the genes encoding a putative acyl-transferase, unique to the dbv cluster (Table 1), and the

ABC transporter. The binary similarity scores over a 40-nt stretch range from 55 (tcp-b1 versus sta-b) to 92%(bal-b versus cep-b).

Upstream of oxyA, a 111-bp segment is conserved inall five clusters (Fig. 6c), with similarity scores rangingfrom 60% (sta versus tcp) to 89% (bal versus cep). Inaddition, a 34-bp portion of this segment is found (with48–85% similarity) in several other IGSs: bal-c1 and cep-c1, which precede the gene encoding the StrR-like reg-ulator; bal-c3, which precedes the gene encoding EvaA;dbv-c2, upstream of a unique dbv gene; tcp-c1 and tcp-c2,which precede genes encoding the putative responseregulator and the balORF2-like polypeptide, respec-tively; and tcp-c4, located in the same IGS as tcp-a(Fig. 2).

A highly conserved 64-nt segment is present in allclusters except dbv (Fig. 6d). Conservation is quite high,ranging from 60 to 86%. This conserved segment isusually located near one end of each cluster (Fig. 2) andis present in one (bal and cep) or three (sta and tcp)IGSs. Sequence conservation terminates in close prox-imity to an ATG or GTG triplet, which, in four in-stances, is the putative start codons for a bona fidecoding sequence. Most of the genes preceded by thisconserved IGS encode putative resistance determinants.Interestingly, this conserved motif is also present in theS. coelicolor vanSRJKHAX locus, which confers induc-

Fig. 6 Conserved IGSs. In panels a–c, the first few codons of thegenes encoding DpgA, the ABC transporter and OxyA, respec-tively, are also indicated, using upper case letters for invariantamino acid residues. Identical bases are indicated in bold, whilebases that are conserved in at least four sequences are in shownupper case.The IGSs are labeled with a cluster prefix, followed by aletter (see Fig. 2) and number suffixes (according to their left-to-right order of Fig. 2). When the conserved motif is present on thecomplementary strand of Fig. 2, the corresponding IGS isunderlined. In panel d, coding sequences corresponding to bonafide genes are underlined; invariant bases are indicated in bold, whilebases present in at least seven IGSs are shown in upper case. Thesequence abbreviated as Sco derives from the S. coelicolorvanSRJKHAX region

47

Page 9: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

ible vancomycin resistance on this non-producer ofglycopeptide antibiotics (Hong et al. 2004).

At the moment, we can only speculate about the roleof these conserved IGSs. Some of them may representbinding sites for transcriptional regulators (e.g. theStrR-like regulator encoded in all clusters). Others mighthave been involved in the mobilization of gene cassettes.In any case, their existence in otherwise divergent clus-ters suggests important role(s) for these conserved ele-ments.

Insertions/deletions and duplications

Within the tcp cluster, a 300-bp segment is present intwo copies that show over 90% similarity, suggestingthat a duplication event occurred in the Actinoplanesteichomyceticus chromosome. This duplicated segmentincludes the last five codons of the gene encodingNRPS module 7, the entire ORF encoding the MbtH-like polypeptide and the first part of tcp-c2. Between thetwo duplicated segments lie genes encoding a homologof BalORF2, a putative mannosyltransferase and theABC transporter. Thus, tcp-b2 and tcp-c3 precede thegenes encoding the ABC transporter and OxyA,respectively, as in the other clusters, while tcp-b1 isupstream of the NRPS genes (Fig. 2). In this way, theproper juxtaposition of the b-type IGS and of the geneencoding the first NRPS modules is maintained despitethe physical separation of the latter from the ABCtransporter gene.

The sta and tcp clusters contain two genes specifyingputative halogenases and b-hydroxylases, respectively(Table 1). In both cases, one of the genes is always moreclosely related to counterparts from the other clustersthan to the other gene within the same cluster, suggest-ing that these extra copies did not originate from recentduplication events. The existence of a deletion derivativeof evaE in the bal cluster has been reported previously(Recktenwald et al. 2002).

Implications for the evolution of glycopeptide clusters

While individual biosynthetic or resistance genes havebeen analyzed from an evolutionary viewpoint (see, forexample, Egan et al. 2001; Metsa-Ketela et al. 2002), weare not aware of any previous studies on cluster evolu-tion in Streptomyces and related genera. In addition,extensive conserved synteny has been observed in manycases [e.g. the antifungal polyenes (Aparicio et al. 2003)and the aminocoumarins (Li and Heide 2005)], while theoccurrence of divergent clusters appears to be anexception (e.g. clusters for the synthesis of related au-reolic acids; Menendez et al. 2004). It should be notedthat, so far, most clusters have been isolated fromStreptomyces strains, and the analysis reported hereprobably represents the first study of clusters for thesynthesis of related compounds derived from four dif-

ferent actinomycete families. Thus, while we cannot ex-clude the possibility that the observed mosaic structuresrepresent a unique feature of glycopeptide clusters, weare more inclined to believe that it reflects the phyloge-netic distances between the strains from which thesesequences were derived.

The evolution of gene clusters has received moreattention in the cyanobacteria (Christiansen et al. 2003;Moffitt and Neilan 2004; Rantala et al. 2004) and inother bacteria (Lopez 2003; Krzywinska et al. 2004; Pielet al. 2004). On the basis of coherent phylogenetic treesbased on 16S rRNA and on portions of the microcystinclusters, Rantala et al. (2004) have proposed an ancientorigin for the microcystin pathway and speculated thatthe corresponding gene cluster has been occasionally lostduring the evolution of cyanobacterial lineages.

In this work, gene organization and phylogeneticanalyses are consistent with a mosaic structure ofglycopeptide clusters, which consist of distinct genecassettes specifying enzymes that participate in sub-pathways, and often contain conserved IGSs. Thissuggests that these clusters originated through thecombination of a small number of elements, creating a‘minimal’ cluster of core genes. Minimal clusters furtherexpanded through the acquisition of genes for specifictailoring steps and of additional regulatory and resis-tance elements. Since many dbv genes or gene cassettesare more related to the bal/cep sequences than to theirtcp counterparts, the dbv and tcp clusters probablyoriginated through the acquisition of elements fromdifferent sources. Therefore, synthesis of the chemicallysimilar A40926 and teicoplanin appears to be the resultof convergent evolution.

Although a large portion of the bal and cep clustersshares a common ancestor, these clusters have divergedby modification of their GC content, accompanied bythe loss of a functional sugar biosynthesis gene in the balcluster and loss and reacquisition of the ion antiportergene in the cep cluster. Since these clusters diverge attheir very ends and in their flanking sequences, it is likelythat at least one of them was acquired by horizontaltransfer. Alternatively, the cluster ends and flankingsequences may have diverged during the differentiationof A. balhimycina from A. orientalis.

Among the five clusters examined here, only dbvcontains a remnant of a tRNA attachment site at oneend (Sosio et al. 2003). This suggests that at least aportion of this cluster was mobilized by a lysogenicphage. We did not detect sequences resembling mobili-zation elements in the other clusters. Therefore, howgenes and gene cassettes were acquired, and how theybecame established in the host chromosome, must re-main a matter of speculation. In any case, the divergenceof the bal and cep clusters in GC content and IGSssuggests a relatively ancient origin, so that many ele-ments involved in DNA mobilization might have beenlost in the meantime.

Close to 50 different glycopeptides have been char-acterized from unrelated strains (Lancini and Cavalleri

48

Page 10: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

1997), suggesting that these antibiotics are produced bymany actinomycetes. Thus, analysis of additional gly-copeptide clusters may provide the missing links thatwill allow us to trace their origins in different actino-mycetes. The evolution of gene clusters for secondarymetabolism is a particularly relevant topic for theindustrially relevant filamentous actinomycetes (Challisand Hopwood 2004), since a single strain can have thegenetic potential to produce several distinct secondarymetabolites (Sosio et al. 2000; Omura et al. 2001;Bentley et al. 2002). It has been proposed that themicrocystin cluster has been lost over the evolution ofcyanobacterial lineages (Rantala et al. 2004). Because ofthe presence of multiple clusters in a single strain, wedoubt that vertical transmission, followed by clusterloss, has played a major role in the evolution of sec-ondary metabolism in actinomycetes.

Acknowledgements We are grateful to Giancarlo Lancini forvaluable discussions. This work was partially supported by grantsfrom the EU (QLK3-1999-00650 and LSHG-CT-2003-503491).

References

Aparicio JF, Caffrey P, Gil JA, Zotchev SB (2003) Polyene anti-biotic biosynthesis gene clusters. Appl Microbiol Biotechnol61:179–188

Bentley SD et al (2002) Complete genome sequence of the modelactinomycete Streptomyces coelicolor A3(2). Nature 417:141–147

Challis GL, Hopwood DA (2003) Synergy and contingency asdriving forces for the evolution of multiple secondary metabo-lite production by Streptomyces species. Proc Natl Acad SciUSA 100:14555–14561

Challis GL, Ravel J (2000) Coelichelin, a new peptide siderophoreencoded by the Streptomyces coelicolor genome: structure pre-diction from the sequence of its non-ribosomal peptide syn-thetase. FEMS Microbiol Lett 187:111–114

Chiu HT, Hubbard BK, Shah AN, Eide J, Fredenburg RA, WalshCT, Khosla C (2001) Molecular cloning and sequence analysisof the complestatin biosynthetic gene cluster. Proc Natl AcadSci USA 98:8548–8553

Christiansen G, Fastner J, Erhard M, Borner T, Dittmann E (2003)Microcystin biosynthesis in Planktothrix: genes, evolution, andmanipulation. J Bacteriol 185:564–572

Egan S, Wiener P, Kallifidas D, Wellington EM (2001) Phylogenyof Streptomyces species and evidence for horizontal transfer ofentire and partial antibiotic gene clusters. Antonie Van Leeu-wenhoek 79:127–133

Hong H-J, Hutchings MI, Neu JM, Wright GD, Paget MS, ButtnerMJ (2004) Characterization of an inducible vancomycin resis-tance system in Streptomyces coelicolor reveals a novel gene(vanK) required for drug resistance. Mol Microbiol 52:1107–1121

Hubbard BK, Walsh CT (2003) Vancomycin assembly: nature’sway. Angew Chem Int Ed Engl 42:730–765

Krzywinska E, Krzywinski J, Schorey JS (2004) Naturallyoccurring horizontal gene transfer and homologousrecombination in Mycobacterium. Microbiology 150:1707–1712

Lancini G, Cavalleri B (1997) Glycopeptide antibiotics (dalba-heptides). In: Kleinkauf H, von Dohren H (eds) Biotechnology,vol 7. VCH, Weinheim Germany, pp369–396

Li S-M, Heide L (2005) New aminocoumarin antibiotics fromgenetically engineered Streptomyces strains. Curr Med Chem12:763–771

Li T-L, Huang F, Haydock SF, Mironenko T, Leadlay PF, SpencerJB (2004) Biosynthetic gene cluster of the glycopeptide antibi-otic teicoplanin: characterization of two glycosyltransferasesand the key acyltransferase. Chem Biol 11:107–119

Liu W, Christenson SD, Standage S, Shen B (2002) Biosynthesis ofthe enediyne antitumor antibiotic C-1027. Science 297:1170–1173

Lopez JV (2003) Naturally mosaic operons for secondary metab-olite biosynthesis: variability and putative horizontal transfer ofdiscrete catalytic domains of the epothilone polyketide synthaselocus. Mol Genet Genomics 270:420–431

Menendez N, Nur-e-Alam M, Brana AF, Rohr J, Salas JA, Men-dez C (2004) Biosynthesis of the antitumor chromomycin A3 inStreptomyces grisues: analysis of the gene cluster and rationaldesign of novel chromomycin analogs. Chem Biol 11:21–32

Metsa-Ketela M, Halo L, Munukka E, Hakala J, Mantsala P,Ylihonko K (2002) Molecular evolution of aromatic polyke-tides and comparative sequence analysis of polyketide keto-synthase and 16S ribosomal DNA genes from variousstreptomyces species. Appl Environ Microbiol 68:4472–4479

Moffitt MC, Neilan BA (2004) Characterization of the nodularinsynthetase gene cluster and proposed theory of the evolution ofcyanobacterial hepatotoxins. Appl Environ Microbiol 70:6353–6362

Omura S, Ikeda H, Ishikawa J, Hanamoto A, Takahashi C, Shi-nose M, Takahashi Y, Horikawa H, Nakazawa H, Osonoe T,Kikuchi H, Shiba T, Sakaki Y, Hattori M (2001) Genome se-quence of an industrial microorganism Streptomyces avermitilis:deducing the ability of producing secondary metabolites. ProcNatl Acad Sci USA 98:12215–12220

Otsuka M, Ichinose K, Fujii I, Ebizuka Y (2004) Cloning,sequencing, and functional analysis of an iterative type I poly-ketide synthase gene cluster for biosynthesis of the antitumorchlorinated polyenone neocarzilin in ‘‘Streptomyces carzino-staticus’’. Antimicrob Agents Chemother 48:3468–3476

Piel J, Hofer I, Hui D (2004) Evidence for a symbiosis islandinvolved in horizontal acquisition of pederin biosyntheticcapabilities by the bacterial symbiont of Paederus fuscipesbeetles. J Bacteriol 186:1280–1286

Pootoolal J, Thomas MG, Marshall CG, Neu JM, Hubbard BK,Walsh CT, Wright GD (2002) Assembling the glycopeptideantibiotic scaffold: the biosynthesis of A47934 from Strepto-myces toyocaensis NRRL15009. Proc Natl Acad Sci USA99:8962–8967

Rantala A, Fewer DP, Hisbergues M, Rouhiainen L, Vaitomaa J,Borner T, Sivonen K (2004) Phylogenetic evidence for the earlyevolution of microcystin synthesis. Proc Natl Acad Sci USA101:568–573

Recktenwald J, Shawky R, Puk O, Pfennig F, Keller U, WohllebenW, Pelzer S (2002) Nonribosomal biosynthesis of vancomycin-type antibiotics: a heptapeptide backbone and eight peptidesynthetase modules. Microbiology 148:1105–1118

Sosio M, Bossi E, Bianchi A, Donadio S (2000) Multiple peptidesynthetase gene clusters in actinomycetes. Mol Gen Genet264:213–221

Sosio M, Stinchi S, Beltrametti F, Lazzarini A, Donadio S (2003)The gene cluster for the biosynthesis of the glycopeptide anti-biotic A40926 by Nonomuraea species. Chem Biol 10:541–549

Sosio M, Kloosterman H, Bianchi A, De Vreugd P, Dijkhuizen L,Donadio S (2004) Organization of the teicoplanin gene clusterin Actinoplanes teichomyceticus. Microbiology 150:95–102

Sußmuth RD, Wohlleben W (2003) The biosynthesis of glycopep-tide antibiotics—a model for complex, non-ribosomally syn-thesized, peptidic secondary metabolites. Appl MicrobiolBiotechnol 63:344–350

Van Wageningen AM, Kirkpatrick PN, Williams DH, Harris BR,Kershaw JK, Lennard NJ, Jones M, Jones SJ, Solenberg PJ(1998) Sequencing and analysis of genes involved in the bio-synthesis of a vancomycin group antibiotic. Chem Biol 5:155–162

Walsh C (2003) Antibiotics: actions, origins, resistance. ASMPress, Washington, DC

49

Page 11: Comparative analysis and insights into the evolution of gene clusters for glycopeptide antibiotic biosynthesis

Wink JM, Kroppenstedt RM, Ganguli BN, Nadkarni SR,Schumann P, Seibert G, Stackebrandt E (2003) Three newantibiotic producing species of the genus Amycolatopsis,Amycolatopsis balhimycina sp. nov., A. tolypomycina sp.

nov., A. vancoresmycina sp. nov., and description of Amy-colatopsis keratiniphila subsp. keratiniphila subsp. nov., andA. keratiniphila subsp. nogabecina subsp. nov. Syst ApplMicrobiol 26:38–46

50